PDF417 Barcodes, or, the nice thing about standards is there are so many to choose from.

Back in 2004 we wrote an encoder for the PDF417 barcode methodology - nothing to do with PDF, despite the name. The standard for this is ISO/IEC 15438, with the first revision published in 2001, the second in 2006 and the third (and most recent at the time of writing) in 2015.

It's a fairly old symbology now but still in use, and as it was originally written in 1991 it predates Unicode. A range of non-ASCII characters can be encoded using the "binary" encoding method.

Take a look at the following two pages - the first one is from the 2001 version of the spec, the second from the 2015 version (where I believe it's unchanged from the 2006 revision).

If you are creating a PDF417 Barcode that uses a non-ASCII character you have to use this table, but the value depends on which version of the spec you refer to - the 2001 edition refers to Windows Codepage 437, the 2006 and 2016 editions to ISO8859-1, and this table is normative in both versions. There's no rationale given for this change, but it is referred to in the later editions (ISO/IEC 15438:2015, section 5.4.3):

In previous PDF417 specifications, the default character set corresponded to ECI 000002 (a code page of the MS-DOS operating system). The interpretation of byte character values below 128 is unchanged and the operation of PDF417 printing and scanning equipment is unaffected. New applications that use byte character values above 127 should assume the ECI 000003 default interpretation for broadest compatibility with current systems. Existing applications encoding values above 127 may continue to encode and process data as before. Applications that rely upon the prior default interpretation of values above 127 may encode ECI 000002 explicitly if they wish to signal this interpretation.

Which implies that it's possible to specify which encoding is use by embedding an "ECI" (extended channel interpretation) symbol. Good news, except that ECI is an optional part of the specification and not implemented by some of the readers we checked (and not implemented fully by any of them).

So what does this mean? If you're encoding a PDF417 barcode, one of these situations will apply

  1. All your characters are in the ASCII range. You don't need to worry about any of this, your barcode will scan on any equipment.
  2. You choose to follow the 2001 version of the specification; your barcodes will scan incorrectly by any decoder following later versions of the specification.
  3. You choose to follow the 2006/2015 version of the specification; your barcodes will scan incorrectly by any decoder following earlier versions of the specification.
  4. You choose to explicitly encode the ECI symbol to remove any ambiguity; your barcode will scan incorrectly on any reader that doesn't support the ECI functionality.

There is no best option, but as of release 2.20.2 of the PDF Library we've chosen to follow the latest release of the specification; even though this will mean a change to the code for anyone generating a PDF417 barcode with non-ASCII characters, it is the best long-term solution and the one used by the vast majority of software-based decoders we've tested. If you depend on the old encoding rules or just want more control, we've added a new PDF417 method to the BarCode class which will allow you to choose one of these three options.

Now, I realise in 2017 this is not exactly breaking news, but I have been unable to find much of a discussion on this anywhere. The ZXIng package, which we use as part of our test harness, did not add PDF417 support until later and appears to be based on the 2006 version - there's a minimal discussion of codepages here, but the full spec is CHF200 or so, and there's little incentive to pay money for a now-outdated specification.

But undoubtedly many commercial vendors, like us, will have implemented the first version of the specification, including a number who have implemented it in firmware. Google shows a few contenders.

To demonstrate this, here are three barcodes, all of which can claim to represent the text "Was für ein Chaos" in PDF417. Scan these with your preferred encoder, or copy the image URL and pass it to your preferred online Barcode decoder to see which version of the standard they support.

With eci=2 (codepage 437), as defined in version 1

With eci=3 (iso8859-1), as defined in version 2 and 3

With eci=-1, which explicitly encodes the ECI

We are left genuinely bewildered as to why this change was made, particularly to a protocol which would frequently be implemented in firmware. If you can shed any further light on this, please do email us at support@bfo.com and let us know.