The "P" in PDF stands for "Portable", and PDF is now an ISO Specification. So you could be forgiven for being surprised when you learn about XFA. We're asked about it a lot so what follows is a bit of a FAQ.
What is XFA
XFA stands for "XML Forms Architecture", and it's been part of Acrobat since Acrobat 6. It's an XML syntax which defines the document (the whole document, not just the form fields) and is embedded inside the PDF. While the specification itself is open and available, it's not part of the ISO PDF specification. It's also long (1500+ pages) and complex, having gone through 10 revisions since Acrobat 6. XFA was deprecated in 2020 with the release of PDF 2.0.
Why does it exist?
Well, the original forms in PDF are arguably a bit of a flawed design and there are a lot of things that could have been done better, so there was room for improvement. XFA is a dialect of XML, which is a sensible container format, and it separates data from content in much the same way as the W3C XForms specification, which is undeniably a good thing.
So what are the problems with XFA
Personally I have quite a list, but the main one is XFA replaces, not augments, the PDF specification: the PDF file is now just a container, and the entire document is defined in the XFA layer. It undoubtedly warranted a new XFA file format; so by trying to elbow it in via the existing standard of PDF Adobe ensured a generation of confusion and annoyance from third party vendors and their customers.
Which tools support it?
For full support, you need Adobe's own products. Our API has limited support as described below, and we expect other third-party products to have support ranging from none to limited.
How do I create an XFA document?
You need an XFA-aware PDF producer, which is likely to be Adobe LiveCycle. When you save your document it will save it as an XFA PDF, and you'll have two options:
- By default, the XFA-enabled PDF is just a basic shell around the XFA document. The entire document is defined in XFA, and an application that's not aware of XFA simply gets a single page PDF requesting you use a newer version of Acrobat.
- You can also save your XFA PDF in "compatibility" mode, which will also create the pages, form fields and other content in the normal PDF way - the document is effectively stored twice, once as XFA, once as PDF. An XFA-aware application like Acrobat will read from the XFA layer (and ignore the PDF layer), and a non-XFA aware application will ignore the XFA layer and use the PDF layer. Obviously, subsequent edits should be made with a tool that can keep the two in sync.
What support do BFO tools have for XFA?
- For PDFs saved without the "compatibility" layer, almost none. You can retrieve or update the XFA object as an XML document, or you can update just the "datasets" object, which is the data model. This effectively allows you to read and write the form values, although you can't see the fields themselves. You can also read/write the document metadata (author, title etc.) but anything else related to document content is unavailable: you can't access the document pages (the pagelist will always return a single dummy page) and you can't view or edit the form fields.
- PDFs saved with a "compatibility" layer can be accessed for reading in a normal way - the PDF pages are valid so you can display them in our viewer or list the form fields and their content. You can also update the values of the form fields (we synchronize the XFA data to match) but any other changes to the PDF will not be synchronized and so will be ignored by Acrobat - so changes like this should be avoided.
- The final thing you can do with compatibility XFA documents is delete the XFA layer. Once removed, Acrobat will treat the PDF as a normal PDF and pages can be modified, form fields added or removed without problems.
How do I know what sort of PDF I have?
- To identify an XFA document, you can check the
feature in the PDF OutputProfile:
boolean xfa = pdf.getBasicOutputProfile().isSet(OutputProfile.Feature.XFAForm);
- Identifying a non-compatibility layer PDF is trickier. Our API will only find a single page and no form fields, and most XFA documents would contain at least one field so this is probably a good test. The only way to know for sure is if you open the PDF with our viewer (or any non-Acrobat viewer) and you see "To view the full contents of this document, you need a later version of the PDF viewer", "If this message is not eventually replaced by the proper contents of the document, your PDF viewer may not be able to display this type of document." or other words to that effect.
How do I delete the XFA layer and what are the consequences?
How is very simple: with our API, just call:
For documents that are going through a final stage of processing before being sent out, and where the customer isn't expected to modify the form, removing the XFA layer should be fine.
What is the best practice for using XFA?
- If there are any products in the PDF's life cycle that are not produced by Acrobat - this includes general tools like ours, PDF viewers (perhaps on your customers machine), any archival requirements like those imposed by PDF/A or print service suppliers - then the best practice is to avoid it. Support from third-party vendors is extremely limited and likely to stay that way.
- If you have to use XFA, then always save your PDF with a "compatibility" layer. This will allow basic modifications as described above, and will give you the option of deleting the XFA layer if necessary.