EMail archiving with EA-PDF and BFO Publisher

EMail archiving with EA-PDF and BFO Publisher

In February of this year the PDF Association published the EA-PDF specification, describing a PDF sub-format for representing email in PDF. Developed with the University of Illinois Urbana-Champaign and the archiving community, it builds on PDF/A.

Most EMail in 2025 is HTML and BFO Publisher is an HTML to PDF converter, so we set about adding support for EA-PDF, which finally made it out of the lab in release 1.4, published August 2025.

Live Demo

We'll start with a demonstration. If you send an email to either:

You should receive a replay with a PDF attached, which will be compliant with EA-PDF 1.0 and also compliant with either PDF/A-3 or PDF/A-4 depending on which address you emailed. Although not part of the spec we've digitally signed the PDF, and it will also be tagged. Of course, don't send us anything sensitive!. There's a bit more information on this service here.

If you're new to EA-PDF it's worth noting that the headers, body and attachments are required to be on different pages, and that the metadata, attachments and the document-part dictionary are where most of the EA-PDF specific action is happening.

Archiving with BFO Publisher

There are many ways to integrate BFO Publisher into an email workflow, but the most obvious starts with a hook added to the organization email server (eg Microsoft Exchange) to run a script, call a URL or somehow supply an email for archiving to an external process.

For this article we'll assume this works by calling a script, with the email read from standard-input, and our goal is to convert that email to an EA-PDF compliant PDF and save it to disk. We're going to communicate with BFO Publisher using the web-service, so first download the latest BFO Publisher and double-click on it to start a server on port 8080.

Then we'll need a script to communicate with it. What might that script look like? It's only calling a webservice so can be in any language; here's an example in JavaScript for NodeJS.

#!/usr/bin/env node

const { simpleParser } = require("mailparser");
const fs = require("fs");
const process = require("process");
const cbor = require("borc");

// Read stdin, convert it to a byte buffer, parse the buffer as an email
// and pass the buffer to "convert". When convert finishes, write the
// completed PDF to a filename based on the message-id, or log an error.
let chunks = [];
process.stdin.on("data", d => chunks.push(d));
process.stdin.on("end", () => {
  const input = Buffer.concat(chunks);
  simpleParser(input, {}).then((mail) => {
    convert(input, (error, output) => {
      let name = mail.messageId.replace(/[^-A-Za-z0-9@._]/g, "") + ".pdf";
      if (output) {
        fs.writeFile(name, output, () => {
          console.log("Wrote \"" + name + "\"");
        });
      } else {
        console.error(JSON.stringify(error));
      }
    });
  }).catch(console.error);
});

/**
 * Convert the email to a PDF by calling a BFO Publisher web-service
 * @param buffer the unprocessed byte-buffer that is the message
 * @param callback a function(error, buffer) to be called when the
 *   conversion completes with either an error or the PDF buffer
 */
function convert(buffer, callback) {
  let req = {
    "put": [
      {
        "path": "message",
        "content_type": "message/rfc822",
        "content": buffer
      }
    ],
    "env": {
      "bfo-ext-mail": "EA Mail-1S + PDF1"
    }
  };

  fetch("http://localhost:8080/convert", {
    method: "POST",
    headers: {
      "content-type": "application/cbor",
    },
    body: cbor.encode(req)
  }).then((res) => {
    let type = res.headers.get("content-type");
    if (type == "application/pdf") {
      res.arrayBuffer().then((buf) => {
        callback(null, Buffer.from(buf));
      });
    } else {
      res.text().then((error) => {
        if (type == "application/json") {
          error = JSON.parse(error);
        }
        callback(error, null);
      });
    }
  }).catch(console.error);
}

Probably the shortest realistic script we could write, notice we're only parsing the email locally to get the Message-ID header for naming the PDF file. The conversion itself is all done by sending the unparsed email buffer as part of a web-service call to an instance of BFO Publisher - running on localhost:8080 in this example, but could be anywhere. The PDF is sent back and saved to the file system. The example uses CBOR for the wire protocol but JSON also works (base64 encode any byte buffers).

Customization

On receiving a message/rfc822 object to convert, BFO Publisher will assemble it into a JSON structure describing all aspects of the email, convert that JSON to HTML using a template, then convert the HTML to the PDF returned to the client. Customizing the appearance of the PDF involves customizing the template, and that's what we're going to show here.

BFO Publisher uses ZTemplate templates by default; we developed this syntax after finding fault with Apache Freemarker (too complex), Mustache (too unsafe), JMESPath (no templates), etc. The syntax is simple enough to be described on one page but capable enough to create the required HTML. However Freemarker could also be used if you prefer.

To create your own template we strongly recommend you start by modifying the default template we supply, and for that you'll need these three files (note these are all in the zip file at the end of this article):

  • eamail.ztl, which is the ZTemplate to convert JSON to XHTML
  • mail.css, which is the stylesheet to apply to the mail wrapper
  • mail-override.css, the stylesheet to apply to the mail wrapper and the mail message

And you'll need to modify the example above to pass them to the server. This is a quick and demo so for simplicity we're loading the buffers for each file directly into the structures by modifying the req object like so:

let req = {
  "put": [
    {
      "path": "message",
      "content": buffer,
      "content_type": "message/rfc822",
    },
    {
      "path": "upload:eamail.ztl",
      "content_type": "text/plain",
      "trusted": true,
      "content": fs.readFileSync("demo/eamail.ztl")
    },
    {
      "path": "upload:mail.css",
      "content_type": "text/css",
      "content": fs.readFileSync("demo/mail.css")
    },
    {
      "path": "upload:mail-override.css",
      "content_type": "text/css",
      "content": fs.readFileSync("demo/mail-override.css")
    }
  ],
  "env": {
    "bfo-ext-mail": "upload:eamail.ztl",
    "bfo-ext-mail-template-output-type": "text/xml",  // be sure to set this if your template outputs XML not HTML
    "bfo-ext-mail-profile":  "pdfa3",                 // This property is referenced in eamail.ztl
    "bfo-ext-mail-isolated": "false",                 // This property is referenced in eamail.ztl
  }
};
  

Run the example again and you should get the same result, but this time the template is under your command. Adding branding or a digital signature is a simple matter of modifying the template. If you want to retrieve the original JSON to see exactly what is being processed, you can use a trick: run the conversion with this template.

<html xmlns="http://www.w3.org/1999/xhtml">
<head>>
<link rel="attachment" href="#source" name="input.json"/>
<script id="source" lang="application/json"><![CDATA[{{ / }}]]></script>
</head>
</html>
   

The returned PDF will have no content except a file attachment containing the original JSON source, which you can retrieve with any PDF tool capable of accessing attachments such Acrobat, the Firefox PDF Viewer or using our own API. Once you have the JSON you can start testing your template; as the bfopublisher.jar contains all the classes from the ZPath Jar, another neat trick is to test the conversion on the command line:

   java -cp bfopublisher.jar me.zpath.Main -t mail.ztl in.json > out.xht
   

Tips and tricks

  • Relative URLs from within the template are resolved relative to the template URL. But frankly it's easiest if you just make all your URLs absolute; that's why we've used upload:mail.css rather than mail.css as a path.
  • Content that is unchanged for each email - the ZTemplate, stylesheets, any image files or keystores for signing the PDF - should ideally be uploaded once and shared, rather than re-uploaded with each email. More details.
  • There is no security on the BFO Publisher instance in this example, but it would be worth doing so on a live site. A simple setup would be two users; an adminstrator account which is used to upload the shared files, and a user-account which is just used to upload mail messages. More details.
  • If you wanted to include (for example) the conversion time in the PDF, set it as an environment property (eg "env":{ "bfo-ext-mail": "upload:mail.ztl", "date": "8 Aug 2025 22:30"}, then include it in the document body by adding something like this in a stylesheet
          <style>
          span.date::before { content: env(date) }
          </style>
          ...
          <span class="date"/>
         
    The advantage of this approach is the template doesn't have to change for each message, meaning it can be cached.
  • While returning the PDF directly from the conversion makes the code simpler for a demonstration, consider using a "redirect" property for your conversion, so the reply is a structure that contains a link to the log file. This will help during debugging. More details.
  • EA-PDF makes heavy use of namespaces in the metadata, so it's critical that the bfo-ext-mail-template-output-type property is set.
  • Finally of course, you can easily add scale by firing up multiple instances of BFO Publisher on different hosts.

Downloadable example

You can download the example described here in email-to-pdf.zip. We've slightly extended what's been covered above to also digitally sign the PDF - which is not certainly required by EA-PDF, but it makes for a neat example.