BFO PDF Library 2.21: some new features

Our latest PDF Library release has a few new features which are worth going over in more detail, as well as the usual batch of small bugfixes and improvements.

Lazy Loading of fonts and images

Our API can load an existing PDF from a java.io.InputStream for speed, or from a java.io.File to reduce the memory footprint: content from the File is loaded only on demand. And for linearized PDFs we can also load from a java.net.URL, as the PDF is structured to enable requesting the parts of the PDF that are used first from a web server. For large PDFs and slow network connections, this is a big advantage.

The 2.21 release adds the same concepts for the other types of file referenced by our API: images and fonts. New constructors on the OpenTypeFont, PDFImageSet and PDFImage classes will load a font or image from a java.io.File, to leave the bulk of the data on disk, or a java.net.URL: if supported by the web-server, the HTTP "Range" header will be used to to transfer only the required portions of the file over the network.

The File constructor will be the more useful of the two. Along with some modifications to the way we manage and store the data internally, the memory footprint when working with very large OpenType fonts will be drastically reduced. Our internal test harness for a project we're working on loads some very large fonts from a URL and uses them simultaneously in 32 threads - this change has reduced the memory footprint for that operation by a factor of about 60! Here's an example showing how to benefit from this:

import java.awt.Color;
import org.faceless.pdf2.*;
 
public class Test {
  public static void main(String[] args) throws Exception {
    final OpenTypeFont font = new OpenTypeFont(0, new File("ArialUnicode.ttf"));
    for (int i=0;i<10;i++) {
      new Thread() {
        public void run() {
          try {
            PDF pdf = new PDF();
            PDFPage page = pdf.newPage("A4");
            PDFStyle style = new PDFStyle();
            OpenTypeFont localfont = new OpenTypeFont(font);
            style.setFont(localfont, 24);
            style.setFillColor(Color.black);
            page.setStyle(style);
            page.drawText("Hello, World", 100, 700);
            long id = Thread.currentThread().getId();
            pdf.render(new FileOutputStream("out-"+id+".pdf"));
          } catch (Exception e) {
            e.printStackTrace();
          }
        }
      }.start();
    }
  }
}
  

We've also managed to reduce the size of the non-shared data-structures used by the StandardCJKFont class as well. Both these changes should make this release a very useful one for customers creating PDFs containing Chinese, Japanese or Korean text.

Loading from a URLConnection

This release also adds the ability to load PDFs, fonts and images from a java.net.URLConnection as well as a java.net.URL. This is useful when you're unaware of the content type in advance. Lets say you have a hypothetical application that downloads a resource from a URL and embeds it in your PDF; it may be an image, or it may be a PDF. Previously your code might have looked like this:

   URLConnection con = url.openConnection();
   String type = con.getContentType();
   if ("application/pdf".equals(type)) {
       PDFReader reader = new PDFReader();
       reader.setSource(url);
       return new PDF(reader);
   } else if (type != null && type.startsWith("image/")) {
       return new PDFImageSet(url);
   }
  

This will work just fine, but there is a catch: the original URL connection is only used to retrieve the content type. Both the PDF and PDFImageSet classes would reopen a new URLConnection from the URL to download the data. For high performance applications doing this thousands of times a minute, this is a bottleneck: particularly so for small files, where the data is essentially being transferred twice.

The better way to do it is this:

   URLConnection con = url.openConnection();
   String type = con.getContentType();
   if ("application/pdf".equals(type)) {
       PDFReader reader = new PDFReader();
       reader.setSource(con);
       return new PDF(reader);
   } else if (type != null && type.startsWith("image/")) {
       return new PDFImageSet(con);
   }
  

Here the original URL connection is used by the PDFReader and PDFImageSet classes. If the URL connection is to a web server that supports the "Range" header, the original connection will be drained of as much data as possible then closed before additional requests made for the required sections of the file. And in all other situations, the URLConnection's InputStream will be read in one go, blocking until the data is available. Data is never requested twice, and for some types of file may never be requested at all. For example, if you're loading a particular page from a multi-page TIFF image, with this approach only the specified page will be downloaded.

Log4j2 support

The logging from all BFO products is done the same way, transparently using whichever logging subsystem is configured. This release adds support for Apache Log4J 2.x, as well as the original Log4J and java.util.logging packages (and they will be used in that order, which will matter if you have more than one installed).

If you're interested in logging from our API, either to turn on debug output if requested or to hide warnings, then this article describes the general process of how to do it. The same approach applies in Log4j 2.x. Here's a quick example showing how to turn on a particular debug message and turn off a particular warning: save this file as log4j2.xml in your CLASSPATH to use it.

<Configuration monitorInterval="60">
  <Appenders>
    <Console name="stdout">
      <PatternLayout pattern="%d %c{3}: %m%n"/>
    </Console>
  </Appenders>
  <Loggers> 
    <Logger name="org.faceless.pdf2.debug.Token" level="debug" /><!-- turn on "Token" debug -->
    <Logger name="org.faceless.pdf2.warning.E21" level="off" /><!-- turn off "E21" warning -->
    <Root level="warn" additivity="true">
      <AppenderRef ref="stdout"/>
    </Root>
  </Loggers>
</Configuration>

PDFDrawable interface

For the last 15 years, the PDFPage and PDFCanvas classes have shared a large number of almost identical methods to add text, images and vector graphics to their respective objects, but have shared no common interface. That has finally changed with the new PDFDrawable interface. I'm prepared to admit that this change is possibly a little overdue.

The two classes had slightly different method signatures for a few methods, which meant some changes to the PDFPage class. The little used pathLine, pathArc and pathBezier methods now return a boolean (they used to be void), and transform takes six doubles as arguments, instead of six floats.

Both these changes are forwards compatible so will require no changes to your code, but because the method signatures have changed, any code calling these methods will need to be recompiled. If you're not sure if you are, changes are you're not: they are legacy methods and we don't expect many people to be using them.