Release 2.18 of our PDF Library includes our new preflightingfunctionality. Previously if you wanted to identify which features are present in a PDF and optionally modify those features to bring the PDF into line with an output profile, like PDF/A or PDF/X, then you'd use two of the methods on the PDF class: getFullOutputProfile and setOutputProfile.
While this worked for the basic case described above, it had a few problems.
Fixing those problems was more than we could do with the existing API, which is why those two methods have been deprecated. The OutputProfiler class is the replacement, and here we'll go into how it works.
If you want to upgrade your code to avoid the deprecatedwarning, that's pretty simple. You
can replace a call to pdf.getFullOutputProfile()
with this:
OutputProfiler profiler = new OutputProfiler(new PDFParser(pdf)); OutputProfile profile = profiler.getProfile();and if you then call
pdf.setOutputProfile(target)
, you can do this with one more line:
OutputProfiler profiler = new OutputProfiler(new PDFParser(pdf)); OutputProfile profile = profiler.getProfile(); profiler.apply(target);
Background-thread friendly
The OutputProfiler class can be run in a background thread while another thread readsfrom the PDF (modifying the PDF while it's being profiled will lead to errors, so don't do that. But reading the content - for example turning the page into a bitmap, as we do in the viewer - is fine). The run and isRunning methods will get you started if you want to background-thread the process, and the API docs go into more detail on how.
Replacing fonts in the PDF with a "FontAction"
The really big benefits to the OutputProfiler class are the new actionsthat can be run on the PDF, which allow you to make fairly sweeping changes to the document content, and in particular some of the areas that typically cause problems during preflighting: fonts, colors and images:
PDF/A and PDF/X both require all the fonts in a PDF to be embedded, so if that's not the case you have two options: turn the page into a bitmap, or replace the fonts. We've covered the conversion to bitmap approach before and that's still a good option, but if you want to convert the fonts you can now set a FontAction on the OutputProfiler before calling apply. The getFont method will be called every time a font is specified, which may specify a replacement font.
We provide an implementation of this interface called AutoEmbeddingFontAction, which will replace any unembedded fonts with embedded ones. It will attempt to identify the correct font to use with heuristics, a word which sounds much better than guesswork but boils down to the same thing:
Give this class a set of embedded fonts, and we will try to find the best match based on the fonts name, the glyph metrics (how wide each character is), and whether the fonts share the same basic properties - Serif, Bold, Italic and so on. Here's a complete example which will process a PDF and replace any unembedded fonts in the file with their "best" match from the Windows "Fonts" directory.
PDF pdf = new PDF(new PDFReader(file)); OutputProfiler profiler = new OutputProfiler(new PDFParser(pdf)); OutputProfiler.AutoEmbeddingFontAction fontaction = new OutputProfiler.AutoEmbeddingFontAction(); File[] fontfiles = new File("C:\\Windows\\Fonts").listFiles(); for (int i=0;i<fontfiles.length;i++) { if (fontfiles[i].getName().endsWith(".ttf")) { OpenTypeFont font = new OpenTypeFont(new FileInputStream(fontfiles[i]), 2); fontaction.add(font); } } profiler.setFontAction(fontaction); profiler.apply(OutputProfile.Default); pdf.render(new FileOutputStream(outfile));
This example will only replace the fonts, but will not modify the PDF in any other
way - the
OutputProfile.Default
profile doesn't require any changes to be made. Typically you'd
replace the fonts as part of a larger conversion to PDF/A, and we'll show this below.
Replacing unembedded fonts is not the only possibility. Say you wanted to ensure that a PDF didn't embed any fonts that had restrictions on embedding. Provided you have a list of those fonts, this is easily done - replace the FontAction in the above example with something like this:
OutputProfiler.FontAction fontaction = new OutputProfiler.FontAction() { public PDFFont getFont(OutputProfiler profiler, String name, boolean embedded, PDFFont font) { if (embedded & disallowedfonts.contains(name)) { return appropriateSubstituteFont; } return null; } };
Replacing Colors with a "ColorAction"
PDF/A and PDF/X also place restictions on which colors (more accurately, which Color-Spaces) can be in the PDF. All colors must be calibrated, which is to say they must include details on how to convert to the CIE XYZ ColorSpace. PDF/X additionally requires that the colors are additive, i.e. they're not RGB.
This means that any Color specified in the PDF must eitherbe explicitly part of a calibrated ColorSpace, orit must be able to be interpreted as part of the output intentof the PDF.
The output intent is the device the PDF is intended to be displayed on, and must be specified for PDF/A and PDF/X documents. For PDF/X it's usually the ICC profile of the intended printer; for PDF/A, any ICC profile will do (a slight oversimplification) and theu sRGBspace is commonly used.
For a device-dependent color to be allowed, it must be convertible to this ColorSpace - this means RGB for an RGB profile, and CMYK or gray for a CMYK profile. Any color that doesn'tmeet these requirements must be converted.
Color Conversion is a complex matter and we're not going to go into the details too much, but for the case described above we supply a standard ColorAction: the ProcessColorAction will convert uncalibrated RGB, CMYK or grayscale colors to the specified ColorSpace. If any Spot colors are defined against an uncalibrated ColorSpace, they'll be redefined to map to the new ColorSpace. Here's how to use it:
PDF pdf = new PDF(new PDFReader(file)); ICC_Profile icc = ICC_Profile.getProfile(ColorSpace.CS_sRGB); OutputProfiler profiler = new OutputProfiler(new PDFParser(pdf)); OutputProfiler.AutoEmbeddingFontAction coloraction = new OutputProfiler.ProcessColorAction(icc); profiler.setColorAction(coloraction); profiler.apply(OutputProfile.Default); pdf.render(new FileOutputStream(outfile));
As with our FontAction example above, this does nothing more than convert all uncalibrated colors in the PDF to sRGB. Typically this would be done as part of a larger conversion to PDF/A or PDF/X, which we'll demonstrate below.
With your own implementation of ColorAction there are many other possiblities. Replace the ColorAction in the above example with this one to convert all colors in the PDF to grayscale:
coloraction = new OutputProfiler.ColorAction() { final ColorSpace target = ColorSpace.getInstance(ColorSpace.CS_GRAY); public ColorSpace changeColor(OutputProfiler profiler, ColorSpace cs, float[] src, float[] dst, boolean fill, int type) { if (dst != null) { // Convert to XYZ, then use Y value with gamma of 2.2 src = cs.toCIEXYZ(src); float g = src[1]; dst[0] = (float)Math.pow(g, 1 / 2.2); } return target; } };
Resampling images
The OutputProfiler can also downsample images. This is probably of less importance in modern workflows, but still can be useful for documents intended for download where file size is more important than fidelity. All images in the PDF are categorised as 1-bit, grayscale or color, and these may be downsampled with the setMaxImageDPI method.
Conclusion
These new operations should mean many more PDF documents can be converted to PDF/A without having to convert them to bitmap. There are still exceptions - transparency is disallowed in PDF/X-3 and PDF/A-1, so a PDF with transparency will still need to be rasterized. However we hope that for those of you tasked with converting large numbers of PDF to PDF/A for storage, this will both reduce the size of your archives and make it more useful: a bitmap PDF cannot be searched for text.
We've included a new example with the download package which implements much of the
code
described above. See the Preflight.java
example in the examples
directory
of the download.
Finally, if you come up with any interesting use cases for the Font or Color actions described above, or find the functionality doesn't quite meet your requirements, drop us a line at support@bfo.com, as we'd like to hear how this functionality is being used and how it can be improved.