New features in the PDF Library 2.15

It's been 4 months since our last PDF API release, what does it have in store? Besides changes to the page list, there are two major new areas:

  • PDF/A-2 and PDF/A-3 support has been added
  • The Swing classes now support linearized loading

New PDF/A revisions

We're seeing more and more companies adopt ISO 19005, aka PDF/A, and we're pleased to have added support for revisions 2 and 3 of the specification. Of course there's no need to change if you're already targeting PDF/A-1 - if you're not familiar with the new revision than this is a good summary. But for those that need the new features allowed in the later revisions of the specification then this new release is for you. We think the most significant are:

  • Embedded files are now allowed: those files must also be PDF/A for PDF/A-2, but this is relaxed in PDF/A-3
  • JPEG2000 compression is now allowed
  • Transparency is now allowed

Currently we only support creation of PDF/A-2b and PDF/A-3b documents, but support for the "U" variation (for Unicode) will be in an upcoming release.

Linearization support in the viewer

For some customers, this is the big one. Linearized documents are designed to be displayable before the entire document has been downloaded, but although we added support for this to the core API in the previous release 2.14, it took until now to get this added to the viewer. It's a complex change, because it invalidates some previous assumptions (namely that pdf.getPage) will return immediately)..

The good news it's in in and working, and for a demonstration point your web-browser to our example applet and select the 12MB "Linearized Example" from the drop-down list above it. The first page should show within a few seconds, but if you check the title bar you'll see a percentage showing how much of the document is actually downloaded..

How to take advantage of Linearization

To make use of this new feature there's actually very little you need to do. The PDF viewer will do this automatically if the following conditions are met:

  1. The PDF you're loading has to be linearized - probably goes without saying, but we'll say it anyway. Our PDF Library has been able to create linearised PDFs for a long time, and of course most other tools can create them too - they're variously called "Web Ready" or "Optimized" PDF in Acrobat.
  2. The PDF must be loaded from an HTTP or HTTPS URL. Our viewer has an API method to do this: PDFViewer.loadPDF(URL) - and if you're using the viewer as an applet you can do this by specifying the URL (relative or absolute) with the pdf parameter to the applet. See the PDFViewerApplet applet for details.
  3. The web-server serving the PDF must support the Range HTTP header in requests, and it must advertise this by adding Accept-Ranges: bytes in the initial response. Most do, if the file is a static file and being served from the filesystem by the default method.

    If you've got your own servlet which is serving the files, as you might if they were loaded from a database for instance, then you need to make sure you've implemented this. Your servlet will see an initial request for the PDF, and if the PDF is linearized that will be cancelled and many other requests made for smaller byte ranges. So if retrieving the PDF is a slow operation, perhaps because it's being retrieved from a remote location or a slow database, or perhaps because it might be modified by another process, then it makes sense to hold a copy of the PDF locally which can be discarded if there are no requests for a set period of time (we'd suggest 30 seconds to be safe).

Linearization and custom viewer features

If you've modified the viewer to add your own custom features, then there are more things to consider. First, if you're not loading linearized documents then you shouldn't need to worry too much: your features will still work, almost certainly without any changes required.

If you want to load linearized PDFs and use your custom features, then some work might be required. The main thing to remember is that a call to pdf.getPage(), or indeed any other code that returns a data structure of some sort from the PDF (form fields, bookmarks, file attachments etc.) might not return immediately - it might trigger a load. If you're doing this on the Swing thread then this will lock the thread, which of course is a bad thing.

To avoid this we've added the LinearizedSupport class to the viewer package. This is an easy way of adding callbacks, so your task will be run when the page is loaded. Let's say, for example, that your feature is going to jump to a specific page in the file when activated. Previously your code might have looked like this:

public void action(ViewerEvent event) {
    List pages = pdf.getPages();
    PDFPage page = pages.get(pagenumber);
    getViewer().getActiveDocumentPanel().setPage(page);
}
This will jump the viewer to the page when run, but if that page hasn't been loaded yet the Swing thread will lock until it has (on the pages.get() line), which will make the application unresponsive. A linearization-aware approach would be to replace this with the following:
public void action(ViewerEvent event) {
    final DocumentPanel dp = getViewer().getActiveDocumentPanel();
    LinearizedSupport support = dp.getLinearizedSupport();
    support.invokeOnPageLoadWithDialog(pagenumber, new Runnable() {
        public void run() {
            dp.setPage(pdf.getPage(pagenumber));
        }
    });
}

This will bring up a loading dialog while the requested page is loading and switch pages on completion - or, if the page is already loaded, will switch immediately. The LinearizedSupport class has several other methods which allow you to schedule tasks when the PDF has loaded the required section of the file.