Creating PDF/A documents with the BFO library

PDF/A is a standardised subset of PDF used for long-term archive storage. The PDF/A format ensures that documents can be correctly rendered even in the absence of all the standard frameworks we take for granted on our computers of today, such as standardised fonts and colour spaces. Notably, PDF/A documents must have all fonts and colour spaces they use embedded inside the document, as well as some additional metadata and stricter PDF syntax rules. The BFO PDF Library makes it quite straightforward for you to author such documents.

PDF/A doesn't specify an archival strategy, it only specifies a number of constraints that a PDF document must satisfy. There are currently two levels of conformance: PDF/A-1b and PDF/A-1a. The PDF library supports level 1b; 1a includes all of the constraints of 1b plus the requirement that all textual content must be tagged with structural information, which is not yet supported. When authoring PDF/A documents, you must ensure:
  • all fonts used are embedded in the document using an embeddable font format
  • all colour spaces used are device independent
  • you do not include audio or video media
  • you do not embed JavaScript, links to executables, or hyperlinks to external documents
  • you do not use encryption of streams
  • you specify standards-based (Dublin Core) metadata for the document

First, we need to use an embeddable colour profile. If you're working with PDF/A you should have an ICC profile available in a file, and you can use java.awt.color.ICC_Profile to load it in. Here I use a profile in the file SWOP.icm, this is a CYMK profile.

import org.faceless.pdf2.*;
import java.awt.Color;
import java.awt.color.*;

ICC_Profile icc = ICC_Profile.getInstance("SWOP.icm");
ColorSpace cs = new ICC_ColorSpace(icc);
OutputProfile profile = 
    new OutputProfile(OutputProfile.PDFA1b_2005_Acrobat,
        "CGATS TR 001", null, "http://www.color.org", null, icc);
If we want to include text then we need to load in the font to use from a file. It must be an embeddable font format such as OpenType.

InputStream fin = new FileInputStream("HelveticaLTStd-Roman.otf");
PDFFont helvetica = new OpenTypeFont(fin, 2);

Now we can create a new PDF with this output profile and draw some stuff into it as usual, ensuring that we use only the embeddable fonts and the colour space we defined. Note that the colour components used to create the java.awt.Color object depend on the colour profile (I use CMYK black below).

PDF pdf = new PDF();
pdf.setOutputProfile(profile);
PDFPage page = pdf.newPage("A4");
PDFStyle style = new PDFStyle();
style.setFont(helvetica, 24);
style.setFillColor(new Color(cs, new float[] { 0, 0, 0, 1 }, 1));
page.setStyle(style);
page.drawText("Hello, PDF/A-viewing world!", 100, page.getHeight() - 100);

We should add some metadata, the more the better to help those people opening your PDF/A time capsule hundreds of years in the future make sense of the contents.

pdf.setInfo("Author", "Joe Bloggs");
pdf.setInfo("Title", "My first PDF/A document");
pdf.setAction(Event.OPEN, PDFAction.gotoFit(page));

Finally, render the document to a stream:

pdf.render(out);