EMail archiving with EA-PDF and BFO Publisher
In February of this year the PDF Association published the EA-PDF specification, describing a PDF sub-format for representing email in PDF. Developed with the University of Illinois Urbana-Champaign and the archiving community, it builds on PDF/A.
Most EMail in 2025 is HTML and BFO Publisher is an HTML to PDF converter, so we set about adding support for EA-PDF, which finally made it out of the lab in release 1.4, published August 2025.
Live Demo
We'll start with a demonstration. If you send an email to either:
- pdfa3@publisher.bfo.com (for a PDF/A-3a file)
- pdfa4@publisher.bfo.com (for a PDF/A-4f file)
You should receive a replay with a PDF attached, which will be compliant with EA-PDF 1.0 and also compliant with either PDF/A-3 or PDF/A-4 depending on which address you emailed. Although not part of the spec we've digitally signed the PDF, and it will also be tagged. Of course, don't send us anything sensitive!. There's a bit more information on this service here.
If you're new to EA-PDF it's worth noting that the headers, body and attachments are required to be on different pages, and that the metadata, attachments and the document-part dictionary are where most of the EA-PDF specific action is happening.
Archiving with BFO Publisher
There are many ways to integrate BFO Publisher into an email workflow, but the most obvious starts with a hook added to the organization email server (eg Microsoft Exchange) to run a script, call a URL or somehow supply an email for archiving to an external process.
For this article we'll assume this works by calling a script, with the email read from standard-input, and our goal is to convert that email to an EA-PDF compliant PDF and save it to disk. We're going to communicate with BFO Publisher using the web-service, so first download the latest BFO Publisher and double-click on it to start a server on port 8080.
Then we'll need a script to communicate with it. What might that script look like? It's only calling a webservice so can be in any language; here's an example in JavaScript for NodeJS.
#!/usr/bin/env node
const { simpleParser } = require("mailparser");
const fs = require("fs");
const process = require("process");
const cbor = require("borc");
// Read stdin, convert it to a byte buffer, parse the buffer as an email
// and pass the buffer to "convert". When convert finishes, write the
// completed PDF to a filename based on the message-id, or log an error.
let chunks = [];
process.stdin.on("data", d => chunks.push(d));
process.stdin.on("end", () => {
const input = Buffer.concat(chunks);
simpleParser(input, {}).then((mail) => {
convert(input, (error, output) => {
let name = mail.messageId.replace(/[^-A-Za-z0-9@._]/g, "") + ".pdf";
if (output) {
fs.writeFile(name, output, () => {
console.log("Wrote \"" + name + "\"");
});
} else {
console.error(JSON.stringify(error));
}
});
}).catch(console.error);
});
/**
* Convert the email to a PDF by calling a BFO Publisher web-service
* @param buffer the unprocessed byte-buffer that is the message
* @param callback a function(error, buffer) to be called when the
* conversion completes with either an error or the PDF buffer
*/
function convert(buffer, callback) {
let req = {
"put": [
{
"path": "message",
"content_type": "message/rfc822",
"content": buffer
}
],
"env": {
"bfo-ext-mail": "EA Mail-1S + PDF1"
}
};
fetch("http://localhost:8080/convert", {
method: "POST",
headers: {
"content-type": "application/cbor",
},
body: cbor.encode(req)
}).then((res) => {
let type = res.headers.get("content-type");
if (type == "application/pdf") {
res.arrayBuffer().then((buf) => {
callback(null, Buffer.from(buf));
});
} else {
res.text().then((error) => {
if (type == "application/json") {
error = JSON.parse(error);
}
callback(error, null);
});
}
}).catch(console.error);
}
Probably the shortest realistic script we could write, notice we're only parsing
the email locally to get the Message-ID header for naming the
PDF file. The conversion itself is all done by sending the unparsed email buffer
as part of a web-service call to an instance of BFO Publisher - running on
localhost:8080 in this example, but could be anywhere. The PDF is sent
back and saved to the file system. The example uses CBOR
for the wire protocol but JSON also works (base64 encode any byte buffers).
Customization
On receiving a message/rfc822 object to convert, BFO Publisher will
assemble it into a JSON structure describing all aspects of the email, convert
that JSON to HTML using a template, then convert the HTML to the PDF returned to the
client.
Customizing the appearance of the PDF involves customizing the template, and that's
what we're going to show here.
BFO Publisher uses ZTemplate templates by default; we developed this syntax after finding fault with Apache Freemarker (too complex), Mustache (too unsafe), JMESPath (no templates), etc. The syntax is simple enough to be described on one page but capable enough to create the required HTML. However Freemarker could also be used if you prefer.
To create your own template we strongly recommend you start by modifying the default template we supply, and for that you'll need these three files (note these are all in the zip file at the end of this article):
- eamail.ztl, which is the ZTemplate to convert JSON to XHTML
- mail.css, which is the stylesheet to apply to the mail wrapper
- mail-override.css, the stylesheet to apply to the mail wrapper and the mail message
And you'll need to modify the example above to pass them to the server. This is a
quick
and demo so for simplicity we're loading the buffers for each file directly into the
structures
by modifying the req object like so:
let req = {
"put": [
{
"path": "message",
"content": buffer,
"content_type": "message/rfc822",
},
{
"path": "upload:eamail.ztl",
"content_type": "text/plain",
"trusted": true,
"content": fs.readFileSync("demo/eamail.ztl")
},
{
"path": "upload:mail.css",
"content_type": "text/css",
"content": fs.readFileSync("demo/mail.css")
},
{
"path": "upload:mail-override.css",
"content_type": "text/css",
"content": fs.readFileSync("demo/mail-override.css")
}
],
"env": {
"bfo-ext-mail": "upload:eamail.ztl",
"bfo-ext-mail-template-output-type": "text/xml", // be sure to set this if your template outputs XML not HTML
"bfo-ext-mail-profile": "pdfa3", // This property is referenced in eamail.ztl
"bfo-ext-mail-isolated": "false", // This property is referenced in eamail.ztl
}
};
Run the example again and you should get the same result, but this time the template is under your command. Adding branding or a digital signature is a simple matter of modifying the template. If you want to retrieve the original JSON to see exactly what is being processed, you can use a trick: run the conversion with this template.
<html xmlns="http://www.w3.org/1999/xhtml">
<head>>
<link rel="attachment" href="#source" name="input.json"/>
<script id="source" lang="application/json"><![CDATA[{{ / }}]]></script>
</head>
</html>
The returned PDF will have no content except a file attachment containing the original JSON source,
which you can retrieve with any PDF tool capable of accessing attachments such Acrobat,
the Firefox PDF Viewer
or using our own API. Once you have the JSON you can start testing your template;
as the bfopublisher.jar
contains all the classes from the ZPath Jar, another neat trick is to test the conversion
on the command line:
java -cp bfopublisher.jar me.zpath.Main -t mail.ztl in.json > out.xht
Tips and tricks
Downloadable example
You can download the example described here in email-to-pdf.zip. We've slightly extended what's been covered above to also digitally sign the PDF - which is not certainly required by EA-PDF, but it makes for a neat example.