BFO Report Generator 1.2 with PDF/UA support

Creating PDF/UA with the Report Generator

Report Generator 1.2, released today, is our first "1.x" change since 2002. There's no need to worry if you have content running on 1.1.x - that output will almost certainly look exactly as it does now (see below for details of this qualifier). We've bumped the version number for three reasons:

  • We've added PDF/UA support, a major new feature.
  • We've fixed a few bugs which may impact rendering if you were relying on their behaviourl
  • The last release was 1.1.70 - the version numbers were getting a bit unweildly.

Non-PDF/UA changes

While this is very much "the PDF/UA release", there are a few other bug fixes which may have some impact.

  • The attribute selector wasn't working for attribute that weren't shortcuts to CSS properties - for example, div[direction=rtl] would work but div[foo] would not. This is now fixed.
  • The "inherit" value wasn't implemented property, and is now fixed.
  • The ":empty" selector has been implemented.
  • Barcodes can now have their font set

These are all relatively minor bug fixes, but because the first three are changes to core CSS, if you're using the attribute or :empty selectors, or using the "inherit" value in your CSS, then the layout may change with this release as these features suddenly start working as intended.

If you're confident those changes will have no impact on your document, then there will be no change to the rendering in this release. But we still recommend upgrading - some bugs were fixed in the underlying PDF Library, including one which would, rarely, result in complaints of errors when the PDF was shown in Acrobat.

PDF/UA in the Report Generator

Now we've got that out of the way: how can you create a PDF/UA document from the Report Generator?

   <meta name="output-profile" value="PDF/UA1"/>
  

or, if you want to create documents compliant to both PDF/A and PDF/UA:

   <meta name="output-profile" value="PDF/A3a+PDF/UA1"/>
   <meta name="output-intent-identifier" value="sRGB"/>
  

Of course, there's a bit more to it. You will definitely need to do the following, to avoid getting an exception while generating:

  1. Make sure you specify a document title with <meta name="title">
  2. Make sure your <img>, <shape> and <*graph> elements have an "alt" attribute
  3. Make sure your <input> elements have a "title" attribute.
  4. Make sure all fonts are embedded - we always recommend a TrueType or OpenType font with bytes="2". This restriction includes the fonts used for marker-bullets, barcodes and so on - a useful CSS reset you can apply is:
         ul { marker-font-family: inherit }
         barcode, h1, h2, h3, h4 { font-family: inherit }
        
    This will reset all non-monospace fonts to inherit from the style applied to the pdf element, on which you'd set the font-family to an embedded font.

Those are the bare minimum steps you'll need, but there are several other steps that you may require:

  1. If you have any hyperlinks, you may want to specify the "alt" attribute. If you don't, we'll set the alternate text for the link to its content, e.g. the word "hello" in <a href="#title">hello</a>.
  2. You should verify your <table> elements are structured properly. PDF/UA places several restrictions on tables, one of them being that every cell has to have a header cell associated with it. There are a few ways this can happen:
    1. The <th> cells in the table can have the "scope" attribute set to "row" or "column", exactly as for HTML (note the "rowgroup" and "colgroup" values are unsupported).
    2. The <td> cells in the table can have a "headers" attribute set to the ID (or IDs, separated by spaces) of the <th> cells (within the same table) that serve as the headers for that cell. This is also exactly as for HTML.
    3. You can do neither of these. If the table contains TH cells, we will try to automatically determine their scope based on the HTML5 algorithm. Or, if the table has no <th> cells at all, we will presume the table is a layout artifact: the table and its <tr> children will not be included in the structure tree as if they had pdf-tag-type: artifact (see below), and the <td> children will be output as if they had pdf-tag-type: Div
  3. Finally, you should confirm that the Document structure accurately represents the structure you're trying to create. This is beyond the scope of what we can cover here, but - for example - you should verify that images that are significant are included with a <img> element, rather than by using a "background-image" attribute on another type of element.

Customizing the output

If the resulting structure tree is not exactly as you want it, the following CSS properties (all new in this release) can be used to control the output. The underlined value for each property is the default.

pdf-tag-type: auto | none | artifact | <string-or-ident>
The pdf-tag-type attribute controls the name of the tag that is written to the tree, and is the first property you would consider adjusting to modify the generated tree.
auto
Choose a type based on the tag and the current context
none
Don't write this element or any descendents to the tree
artifact
Skip this element but write any descendents to the tree
string-or-ident
Use the specified value as the tag
pdf-tag-placement: auto | none | block | inline | before | start | end
The pdf-tag-placement attribute sets the "Layout:Placement" attribute in the tree. The default value of auto will set this property when required, and is usually the best option. A value of none will suppress this attribute entirely, and any other valid value will be passed through verbatim.
pdf-tag-background-color: none | auto | <color>
The pdf-tag-background-color attribute sets the "Layout:BackgroundColor" attribute in the tree. The default value is none, but "auto" can be used to set it to the computed value of the background-color property, or a color can be set explicitly
pdf-tag-border-color: none | auto | <color>{1|4}
The pdf-tag-border-color attribute sets the "Layout:BorderColor" attribute in the tree. The default value is none, but "auto" can be used to set it to the computed value of the border-color properties, or a single color can be set explicitly, or four colors can be set explicitly, in which case the values are specified clockwise from the top.
pdf-tag-border-style: none | auto | [ no-border | hidden | dotted | dashed | solid | double | groove | ridge | inset | outset ]{1|4}
The pdf-tag-border-style attribute sets the "Layout:BorderStyle" attribute in the tree. The default value is none, but "auto" can be used to set it to the computed value of the border-style properties, or a value can be set explicitly, or four values set explicitly as for border-color, above. The value "no-border" will write the value "None".
pdf-tag-border-thickness: none | auto | <length>{1|4}
The pdf-tag-border-thickness attribute sets the "Layout:BorderThickness" attribute in the tree. The default value is none, but "auto" can be used to set it to the computed value of the border-width properties, or a value can be set explicitly, or four values set explicitly as for border-color, above.
pdf-tag-color: none | auto | <color>
The pdf-tag-color attribute sets the "Layout:Color" attribute in the tree. The default value is none, but "auto" can be used to set it to the computed value of the color property, or a value can be set explicitly.
pdf-tag-text-align: none | auto | start | center | end | justify
The pdf-tag-text-align attribute sets the "Layout:TextAlign" attribute in the tree. The default value is none, but "auto" can be used to set it to the computed value of the text-align property, or a value can be set explicitly.
pdf-tag-list-numbering: auto | no-number | disc | circle | square | decimal | upper-roman | lower-roman | upper-alpha | lower-alpha
The pdf-tag-list-numbering attribute sets the "List:ListNubering" attribute in the tree. The default value is "auto", which will set the value where required, based on the list-style-type CSS property. The value "none" can be used to suppress this, or one of the other values can be used to set it explicitly. The value "no-number" will result in the value "None" being written to the tree.
pdf-tag-table-rowspan: auto | none | <number>
The pdf-tag-table-rowspan attribute sets the "Table:RowSpan" attribute in the tree. The default value is "auto", which will set the value where required, based on the rowspan attribute. The value "none" can be used to suppress this, or a number can be specified to set this explicitly.
pdf-tag-table-colspan: auto | none | <number>
The pdf-tag-table-colspan attribute sets the "Table:ColSpan" attribute in the tree. The default value is "auto", which will set the value where required, based on the colspan attribute. The value "none" can be used to suppress this, or a number can be specified to set this explicitly.
pdf-tag-table-headers: auto | none | <string>+
The pdf-tag-table-headers attribute sets the "Table:Headers" attribute in the tree. The default value is "auto", which will set the value for elements with a resolved tag-type of <Td> based on the headers attribute, if specified. The value "none" can be used to suppress this, or a value can be set to specify this explicitly.
pdf-tag-table-scope: auto | none | row | column | both
The pdf-tag-table-scope attribute sets the "Table:Scope" attribute in the tree. The default value is "auto", which will set the value for elements with a resolved tag-type of <Th> based on the scope attribute, if specified, or based on the algorithm specified for HTML5 otherwise. The value "none" can be used to suppress this, or a value can be set to specify this explicitly.
pdf-tag-table-summary: auto | none | <string>
The pdf-tag-table-summary attribute sets the "Table:Summary" attribute in the tree. The default value is "auto", which will set the value for elements with a resolved tag-type of <Table> based on the summary attribute, if specified. The value "none" can be used to suppress this, or a value can be set to specify this explicitly.

Quick testing

To quickly get yourself up and running, it's possible to convert many of the samples included with our Report Generator (including the userguide for the Report Generator itself, a 70 page document) to PDF/UA with the addition of one line. Insert this:

    <include xmlns="http://www.w3.org/2003/XInclude" href="https://bfo.com/blog/2019/12/04/bfo_report_generator_1_2_with_pdf_ua_support/pdfua.xmli" />
   

into the XML as the first child of the <head> element. This will include an XML snippet that will set the output-profile to PDF/UA, and subsitute free-to-distribute embedded fonts for each of the Times, Helvetica, Courier, Symbol and ZapfDingbats fonts which are the default unembedded fonts available in PDF. The fonts are loaded directly from our website so please don't do this with your live code (or when our site goes down, your application goes with it). They're available to download from fonts.zip.