How to optimize your PDFs for serving over the web.
The 2.10.3 release of the PDF library adds support for writing "Linearized" PDF documents, sometimes called "Web Optimized" or "Fast Web View" enabled. This feature has been a part of the PDF specification since Acrobat 3.0, but it's poorly understood even by developers.
The idea behind linearization is faster display of PDF documents downloaded from the web. It does this by arranging the document structure in a specific way, and adding "hints tables" which specify the position in the file of certain objects.
A linearization-aware PDF viewer requests the PDF from the webserver normally, but if it finds it's linearized halts the download after receiving the hints table and first page. Navigating to a different page sends a request for just that section of the document to the webserver, by means of an HTTP Partial Content request - effectively performing random access on the file over the web.
Already this throws up a few facts:
-
The PDF must be downloaded from a webserver for viewing
This means using the Acrobat plugin in your web-browser: if you right-click the document and "Save As" for opening later, the PDF is treated normally. -
Multiple requests will be made for the same URL
Generating a linearized PDF on the fly and returning it directly won't work - the PDF has to be saved (usually to the filesystem) and served from there (perhaps by means of a 302 redirect). The web-server must set theContent-Length
header and also setAccept-Ranges: bytes
. For normal disk files both Apache and Tomcat (since 6.0.20) set these headers: if you're having problems then check your webserver is returning the correct headers. -
The more pages you have, the more useful linearization is
For a single page PDF Acrobat ignores the linearization and downloads the PDF normally. If your PDF is two pages with the first one consisting of a large graphic and the second one mostly of text the benefits will be pretty marginal, but for a 10,000 page PDF the impact of linearization can be huge.
Another key point about linearization is it won't work if the document has been altered - so loading a linearized PDF, modifying a form field then saving it will remove the linearization. Of course you can re-linearize the PDF again when you save it: the one time this won't work is if the original PDF is digitally signed, which gives us our final key point.
-
Modifying a linearized, digitally signed PDF removes linearization
If digital signatures are part of your workflow, make sure they're applied last, just before the PDF is made available for download. PDFs with more than one digital signature cannot be linearized.
So now you've decided to linearize your PDF documents, how do
you do it? In Acrobat when you "Save As" they're linearized by
default - with our
PDF Library it's as simple as applying
the correect
OutputProfile
before saving:
PDF pdf = new PDF(new PDFReader(new File("in.pdf"))); OutputProfile p = new OutputProfile(OutputProfile.Default); p.setRequired(OutputProfile.Feature.Linearized); pdf.setOutputProfile(p); pdf.render(new FileOutputStream("out.pdf"));
There is a cost to this operation. The
render
operaiton is slower and uses more memory, so
there's no point in linearizing unless your PDFs are going to
benefit.