BFO PDF Library 2.24.2 - what's new?

BFO PDF Library 2.24.2

We haven't done a "new feature summary" of our PDF Library since 2.24, released late last year. So here's a bit more detail on what we've added in the last six months.

ToUnicode improvements

Each font embedded in a PDF can have a ToUnicode map embedded with it. This gives a canonical mapping from the codes embedded in the document to a Unicode value, required for text extraction. We've always embedded a ToUnicode map with each font, but in 2.24.2 we've rewritten this process with some improvements:

  1. The ToUnicode map now contains only the glyphs that have been used. For very large fonts this can be quite a space saving
  2. We've fixed some bugs when the characters used extend beyond the Unicode BMP.
  3. The ToUnicode map reflects any OpenType substitutions made during layout.

This latter point is what we want to explain. Say, for example, you're using a font that has GSUB table for advanced opentype layout, you've turned on the "liga" feature, and you want to add the word "difficult" to your PDF. The ligature feature combines the "f" and "i" glyphs into a single "fi" ligature, which is written to the PDF.

Prior to 2.24.1, this would have worked perfectly if the font also used the legacy U+FB01 character to map to the same glyph. But many ligatures don't have this sort of legacy mapping, and those that do are not always included in fonts. Although we always write the "ActualText" fallback in this case, we find this can confuse text-selection in Acrobat - it works, but the selecton area looks wrong. So it's not an ideal solution.

From 2.24.1, we'll detect OpenType substitutions and automatically derive a mapping for the resulting glyphs to add to the ToUnicode map. This means that selecting text in Acrobat should work properly for even the wildest OpenType substitutions. And where we can't do this unambiguously, we'll still fall back to writing the "ActualText" value, ensuring the text can be extracted or read with a screen reader.

Metric-compatible versions of the standard fonts

PDF guarantees the existance of the Standard 14 fonts on every platform without needing to embed them. Their metrics are fixed and well known, and they're pretty much the default choice for many when it comes to creating a PDF. But, being unembedded, they cannot be used with PDF/A or PDF/X.

We've had the ability to replace unembedded fonts in a PDF with embedded ones for quite some time, but we've always relied on the user to supply the fonts. But as of 2.24.2 the PDF Library includes 14 free fonts which have identical metrics to the standard 14. These will be substituted by the AutoEmbeddingFontAction automatically, with the resulting PDF looking identical to the original, just with embedded fonts.

The fonts are derived from the URW++ fonts (well known to Linux users as the Nimbus fonts) and modified by the Polish TeX Users Group for use with TeX as the TeX Gyre fonts. We've further modified them to remove the OpenType tables, and to ensure that all metrics (glyph advance, kerning, and font-wide metrics such as ascenders) are identical to those used by Adobe for the Standard 14 fonts, making them a drop-in replacement. The fonts are included with the Jar and are also available here for separate download.

Overprint tweaks

We had an interesting request from a customer trying to draw a PDF on top of a white rectangle on their HP Indigo press. This is a case where Overprint is important; printing a white rectangle first ensures the colors are composited correctly, regardless of the media color.

However, if the PDF being composited tries to control the overprint flags itself, things don't work so well. We've managed to work around this problem - the details are probably a little involved for this page, but if you're printing on the same hardware and think this might of use, then email support@bfo.com

Bug fixes

There are the usual round of bug fixes. We're doing lots of testing with different fonts, and have fixed a few issues particular to several. One is worth detailing - a change we put in about a year ago which was supposed to speed things up, actually slowed things down for one of the NotoSerif CJK fonts. That's been reverted, so if you're using those fonts a lot, it's worth updating.

Summary

That's a very brief summary. There's a lot more been added, mostly small adjustments which won't sound very useful unless you've been keeping up with the draft specifications for PDF/A-4 and PDF/UA-2; topics I expect we'll be talking more about when they're published.

Until then, the new release is available at http://bfo.com/download as always.