Class PDF

  • All Implemented Interfaces:
    Cloneable

    public class PDF
    extends Object

    A PDF describes a single document in Adobe's Portable Document Format. It is the highest-level object in the package.

    The life-cycle of a PDF generally consists of being created, adding new pages, optionally adding information about the document structure (e.g. bookmarks), and finally rendering to an OutputStream.

    This class only deals with the structure of the document. To actually create some content see the PDFPage class.

    Here's the ubiquitous example:
       import org.faceless.pdf2.*;
    
       // Create a new PDF
       PDF p = new PDF();
    
       // Create a new page
       PDFPage page = p.newPage(PDF.PAGESIZE_A4);
    
       // Create a new "style" to write in - Black 24pt Times Roman.
       PDFStyle mystyle = new PDFStyle();
       mystyle.setFont(new StandardFont(StandardFont.TIMES), 24);
       mystyle.setFillColor(java.awt.Color.black);
    
       // Put something on the page
       page.setStyle(mystyle);
       page.drawText("Hello, PDF-viewing World!", 100, 100);
    
       // Automatically go to this page when the document is opened.
       p.setAction(Event.OPEN, PDFAction.goTo(page));
    
       // Add some document info
       p.setInfo("Author", "Joe Bloggs");
       p.setInfo("Title", "My Document");
    
       // Add a bookmark
       java.util.List bookmarks = p.getBookmarks();
       bookmarks.add(new PDFBookmark("Hello World page", PDFAction.goTo(page)));
    
       // Write the document to a file
       OutputStream out = new FileOutputStream("test.pdf");
       p.render(out);
       out.close();
     
    Since:
    1.0
    See Also:
    PDFPage, PDFReader
    • Constructor Detail

      • PDF

        public PDF()
        Create a new, empty PDF document
        Since:
        1.0
      • PDF

        public PDF​(PDF pdf)
        Create a PDF that's a clone of the specified PDF. When creating multiple copies of a single PDF, it's much faster to use this method than to re-read the PDF using a new PDFReader
        Since:
        2.0
      • PDF

        public PDF​(PDFReader reader)
        Create a PDF from the specified PDFReader. The PDFReader class is available as part of the "Extended Edition" of the PDF library, and is included with this package. If the document contains multiple revisions, the latest revision is loaded.
        Since:
        1.1.12
      • PDF

        public PDF​(PDFReader reader,
                   int revision)
        Create a PDF from the specified PDFReader, using the specified revision of the document. The PDFReader class is available as part of the "Extended Edition" of the PDF library, and is included with this package. The revision number must be between 1 and PDFReader.getNumberOfRevisions(), otherwise an IllegalArgumentException is thrown.
        Parameters:
        reader - the PDFReader to use
        revision - the revision number to use - between PDFReader.getNumberOfRevisions() to load the latest or 1 to load the original document.
        Throws:
        IllegalArgumentException - if the revision is outside the specified range
        Since:
        1.2.1
    • Method Detail

      • isLicensed

        public static boolean isLicensed()
        Return true if the PDF is licensed, false if it's running as a demo
        Since:
        2.11.22
      • setPropertyManager

        public static final void setPropertyManager​(PropertyManager manager)
        Set the PropertyManager to be used by the PDF library
        Since:
        2.8.5
      • getPropertyManager

        public static final PropertyManager getPropertyManager()
        Get the PropertyManager currently being used by the PDF library
        Since:
        2.8.5
      • setExecutor

        public static final void setExecutor​(ExecutorService e)

        Set the ExecutorService to be used by the PDF library to run any parallel operations. Parallel operations in the API include reading the PDF with a PDFReader, saving the PDF, and profiling. Prior to 2.18.1 parallel work was done in short lived Threads, but the use of an Executor allows the use of a system-wide thread pool for better resource management.

        The parameter to this method is the ExecutorService to use for these parallel operations, or null to use the PDF Library default. If the default is used, then a fixed size thread pool is created with the number of threads based on the Threads property if specified, or the number of processors if not. A special value of "1" for this property will ensure there is no parallel processing and everything is done in the calling thread.

        Since:
        2.18.1
      • close

        public void close()

        Close any file resources the PDF may be holding on to. These will be automatically closed during garbage collection, but this method may be called earlier if necessary to speed disposal of those resources.

        Since:
        2.11.2 - prior to that no resources were held and this method wasn't necessary
      • getPDFVersion

        public int getPDFVersion()
        Get the version of the PDF. The version provides an indication of which version of Acrobat the file can be loaded in, although it is quite normal for a 1.4 document to be loaded correctly by a 1.3 viewer (for example). Since Acrobat 9 and ISO 32000, version numbering has become more complicated, so to interpret the value from this method you will need the following table. Note the earliest version of PDF supported by this API is 1.3, so any documents from earlier revisions will be automatically upgraded.
        3PDF 1.3 (as created by Acrobat 4.x)
        4PDF 1.4 (as created by Acrobat 5.x)
        5PDF 1.5 (as created by Acrobat 6.x)
        6PDF 1.6 (as created by Acrobat 7.x)
        7PDF 1.7 / ISO 32000-1:2008 (as created by Acrobat 8.x)
        8PDF 1.7 / ISO 32000-1:2008 Extension Level 3 (as created by Acrobat 9.x)
        9PDF 1.7 / ISO 32000-1:2008 Extension Level 5 (as created by Acrobat 9.1)
        10PDF 1.7 / ISO 32000-1:2008 Extension Level 8 (as created by Acrobat X)
        11PDF 1.7 / ISO 32000-1:2008 Extension Level 11 (as created by Acrobat XI)
        12PDF 2.0 / ISO 32000-2:2012
        Returns:
        the version of the PDF document
        Since:
        2.0
        See Also:
        setOutputProfile(org.faceless.pdf2.OutputProfile)
      • setOutputProfile

        @Deprecated
        public void setOutputProfile​(OutputProfile targetprofile)
        Deprecated.
        since 2.18 the OutputProfiler class or PDF(OutputProfile) constructor should be used instead of calling PDF.setOutputProfile

        Set the Output Profile to use when rendering this PDF document. Since 2.18 this method will work as before, but is deprecated in favour of the new OutputProfiler class. Code calling this method like this:

         OutputProfile oldprofile = pdf.getFullOutputProfile();
         pdf.setOutputProfile(newprofile);
         
        should be updated to look like this:
         OutputProfiler profiler = new OutputProfiler(new PDFParser(pdf));
         OutputProfile oldprofile = profiler.getProfile();
         profiler.apply(newprofile);
         

        and for code that applies an OutputProfile to a new PDF, call the PDF(OutputProfile) constructor:

         OutputProfile profile = new OutputProfile(OutputProfile.PDFA3a, "sRGB", null, "http://www.color.org", null, icc);
         PDF pdf = new PDF(profile);
         

        or, finally, for simple cases where the features being applied are all part of the "basic" OutputProfile.

         new OutputProfiler(pdf).apply(newprofile);
         
        Throws:
        IllegalStateException - if the current profile doesn't match and can't be altered to match the specified profile, or if the current profile isn't known (because it's an existing PDF document that hasn't been scanned with getFullOutputProfile()).
        Since:
        2.0
        See Also:
        getPDFVersion(), getBasicOutputProfile(), getFullOutputProfile()
      • getBasicOutputProfile

        public OutputProfile getBasicOutputProfile()
        Return a basic OutputProfile for this PDF. The "Basic" profile consists of information which be easily determined without having to traverse the PDF or parse the page streams. It takes no time to run, and as it doesn't parse the page content it requires only the basic License to run. See the OutputProfile.Feature class to see which features are returned in the basic profile.
        Since:
        2.6.1
        See Also:
        OutputProfiler
      • getFullOutputProfile

        @Deprecated
        public OutputProfile getFullOutputProfile()
        Deprecated.
        since 2.18 the OutputProfiler class gives more control and should be used instead of PDF.getFullOutputProfile

        Return a full OutputProfile for this PDF. This routine parses the entire document to determine it's contents - this can be a very lengthy operation, so calling the getBasicOutputProfile() method is generally prefereable unless the Feature you're querying is not tested by that method.

        This method cycles through every object in the PDF structure in a process very similar to rendering the entire PDF to a bitmap, and updates the OutputProfile returned by getBasicOutputProfile() with the complete list of features used in this PDF - which may cause an IllegalStateException if it's set to something other than OutputProfile.Default.

        This method requires an "Extended Edition plus Viewer" license to run.

        Since:
        2.6.1
        See Also:
        OutputProfiler
      • getNumberOfRevisions

        public int getNumberOfRevisions()

        Return the number of revisions made to the document. This will only be useful for documents read in using a PDFReader - all other PDFs will return zero. See the PDFReader class for more information on revisions.

        Note that in 2.7 the return value of this method was modified slightly so that the original version of a PDF is revision 1, not revision 0. New documents created with this library will still have a revision 0 before they're saved.

        Since:
        1.2.1
        See Also:
        PDFReader, FormSignature.getNumberOfRevisionsCovered()
      • newPage

        public PDFPage newPage​(String pagesize)

        Create a new page of the specified page size and add it to this PDF. The page size is specified as a string of the form "WxHU", where W is the width of the page, H is the height of the page, and U is an optional units specifier - it may be "mm", "cm" or "in", and if it's not specified it's assumed to be points. The resulting page size is rounded to the nearest integer unless the units are specified as points (eg. 595.5x842 - fractional sizes added in 2.2.3).

        For convenience we've defined several standard sizes that you can pass in, like PAGESIZE_A4, PAGESIZE_A4_LANDSCAPE, PAGESIZE_LETTER, PAGESIZE_LETTER_LANDSCAPE and so on.

        Since 2.2.3 you can also pass in a String containing the common name of the paper size, optionally with a "-landscape" suffix, eg "A4", "Letter", "A2-landscape", "DL" and so on. All ISO sizes and most US and JIS paper (and some envelope) sizes are recognised.

        Example values include "210x297mm", "595x842" or "A4", which would both produce an A4 page, and "8.5x11in", "612x792" or "Letter", which would both produce a US Letter page.

        This method is identical to calling:

           PDFPage page = new PDFPage(pagesize);
           pdf.getPages().add(page);
         
        Parameters:
        pagesize - the size of the page to create
        Throws:
        IllegalArgumentException - if the specified page size cannot be parsed
      • newPage

        public PDFPage newPage​(int w,
                               int h)
        Create a new PDFPage object of the specified size and add it to this PDF. The size is specified in points. This method is identical to calling:
           PDFPage page = new PDFPage(w, h);
           pdf.getPages().add(page);
         

        The arguments are integers for API compatibilty reasons only. If required you can create pages sized to a fraction of a point using the newPage(String) method.

        Parameters:
        w - the width of the page, in points
        h - the height of the page, in points
        Returns:
        a new PDFPage object
        Since:
        1.0
        See Also:
        getPages()
      • newPage

        public PDFPage newPage​(PDFPage page)
        Create a new PDFPage object that is a clone of the specified page, and add it to this PDF. This method is identical to calling:
           PDFPage page = new PDFPage(originalpage);
           pdf.getPages().add(page);
         
        Parameters:
        page - the PDFPage object to clone
        Returns:
        a new PDFPage object which is a clone of the specified page
        Since:
        2.0
        See Also:
        getPages()
      • getPages

        public List<PDFPage> getPages()
        Returns a List of the documents pages which may be manipulated to reorder, delete or append pages to the document. This is done using the standard List methods. For example, to reverse the pages in the document, you could do something like this:
           List pages = pdf.getPages();
           List temp = new ArrayList(pages);
           pages.clear();
           for (int i=temp.size()-1;i>=0;i--) {
              pages.add(temp.get(i));
           }
         
        or to move (not copy) all the pages from one PDF to another, try
           pdf1.getPages().addAll(pdf2.getPages());
         
        Note that each page can only be in this list once, and a page can't be in the page list of more than one PDF. Attempting to add a page from this list (or another PDF's page list) will remove it from that location automatically.
        Since:
        1.1.12
      • getNumberOfPages

        public int getNumberOfPages()
        Return the number of pages in this PDF. Simply calls pdf.getPages().size()
        Returns:
        the number of pages in the document
        Since:
        1.1
        See Also:
        getPages()
      • getPage

        public PDFPage getPage​(int pagenumber)
        Return the specified page. Identical to pdf.getPages().get(pagenumber)
        Parameters:
        pagenumber - the page number, between 0 and getNumberOfPages()-1
        Returns:
        the specified page
        Throws:
        ArrayIndexOutOfBoundsException - if the page number is not in range
        Since:
        1.1
        See Also:
        getPages()
      • getPage

        public PDFPage getPage​(String name)
        Get a "Named Page" from the PDF. If a Template with the specified name is found in the PDF it will be returned, and may be cloned via the PDFPage(PDFPage) constructor.
        Since:
        2.10.5
      • getLastPage

        public PDFPage getLastPage()
        Return the last page of this PDF. Identical to pdf.getPage(pdf.getNumberOfPages()-1)
        Since:
        1.0
        See Also:
        getPages()
      • setEncryptionHandler

        public void setEncryptionHandler​(EncryptionHandler encrypt)

        Set the EncryptionHandler to encrypt this document with. This method allows you to limit access to the document, either by requiring a password to open it, preventing the document from being printed and so on, or more.

        Changing encryption will destroy any digital signatures in the document, which is why Acrobat won't allow you to do lt. Prior to version 2.4, this library didn't preserve previously applied signatures when writing a file, so this wasn't an issue - a warning was displayed and the signature was removed. Now, however, signatures can be preserved, and this method will throw an IllegalStateException if called on a previously signed document. This will also occur if the encryption settings (like password, permission flags etc.) are changed. If you want to re-encrypt a signed document, you have to delete any existing signatures first.

        Parameters:
        encrypt - the EncryptionHandler to be used to encrypt and limit access to the document
        Since:
        2.0
        See Also:
        EncryptionHandler, StandardEncryptionHandler
      • setInfo

        public void setInfo​(String key,
                            Object val)

        Set an item of PDF meta-information, such as author or title. Prior to version 2.6.2 this method only updated the original "Info" dictionary, used by PDFs since the early days to store metadata. Since 2.6.2 this method can now be used to update both the original Info dictionary and the XMP Metadata.

        Most people won't need to worry about the details. To set the metadata in the PDF, just specify an appropriate key to set the Title, Author etc. of the document. The list of known keys is below, although any key can be used - if it's not on the list it will appear in the "Custom" pane in Acrobat's "Document Properties" window.

        TitleThe document's title. This value must be set for PDF/X documents
        AuthorThe name of the person who created the document.
        SubjectThe subject of the document.
        KeywordsComma separated list of keywords associated with the document.
        CreatorIf the document was converted to PDF from another format, the name of the application that created the original document from which it was converted.
        TrappedThe document's trapping status. Must be "True", "False" or "Unknown". This has to be set to "True" or "False" for PDF/X documents

        Note that "CreationDate" and "ModDate" are set by the PDF Library internally and do not need to be set manually (although since 2.11.26 they can be overridden). "Producer" is set internally and cannot be changed.

        Since 2.6.2, updating one of the fields listed above will also update the XMP metadata to match. It's also possible to update fields in the XMP metadata that aren't listed above. This can be done be specifying the key name as xmp:ns:attribute, where ns is the recommended namespace prefix (as specified in the XMP specification) and attribute is the attribute to set. For instance, to set the "rights" attribute in the Dublin Core schema, you could call setInfo("xmp:dc:rights", "Copyright (C) Whoever");.

        No validation is done on these fields or the data, although fields listed as bags, sequences or language alternates in the XMP specificaiton will be automatically wrapped in the appropriate structure. If more complex fields need to be set then the setMetaData(java.lang.String) method can be used to pass in an entire RDF object.

        The value parameter may be String, Date, Boolean or Float, or null to remove that item of meta-information.

        Parameters:
        key - the meta-information field to set
        val - the value to set it to - a String, Date, Boolean, Float or null
        Since:
        1.0
      • getInfo

        public String getInfo​(String key)

        Return document meta data as set by setInfo() as a String. If the key name begins with xmp: then the appropriate field will be extracted from the XMP metadata stream - see the setInfo for more information

        For example, to get the author of the document from the PDF Info dictionary:

        String author = pdf.getInfo("Author");
        and to extract the "rights" attribute of the Dublin Core Schema from the XMP metadata:
        String copyright = pdf.getInfo("xmp:dc:rights");
        If a type of object requested from the XMP metadata cannot obviosuly be turned into a String, the value returned from this method is undefined.
        Parameters:
        key - the field to get
        Returns:
        the value of the specified field, or null if the field is not set
        Since:
        1.0
      • getInfo

        public Map<String,​Object> getInfo()

        Return the PDF meta information, as set by setInfo(). This is in the form of an unmodifable Map, where the keys are String objects, and values may be String, Date, Boolean, Calendar or Float objects. If no meta information is available, returns an empty Map.

        Since version 2.1.2, any keys representing Dates (such as "ModDate" or "CreationDate") will also have an equivalent entry with a leading underscore, eg. "_ModDate". These give the same information but as a Calendar rather than a Date. This is to allow extraction of TimeZone information, sadly lacking from the Date class.

        Note this map doesn't include any of the XMP metadata - only data from the original Info dictionary.

        Returns:
        an unmodifiable Map containing any meta information specified in the document.
        Since:
        1.1.12
      • setLocale

        public void setLocale​(Locale locale)
        Set the default locale for this document. This is mainly useful in right-to-left locales like arabic, as it sets the default text alignment. The locale may be set and reset as many times as required. The locale in use when the document is rendered is considered to be the locale of the document as a whole.
        Since:
        1.1
      • getLocale

        public Locale getLocale()
        Return the PDF's Locale, as set by setLocale or (since 2.6.1) as loaded from the PDFs "Lang" tag. If no locale is specified this method returns null.
        Since:
        1.1
      • setAction

        public void setAction​(Event event,
                              PDFAction action)
        Specify an action to perform when the specified event occurs on the document. Valid events are Event.OPEN and Event.CLOSE, which occur within every version Acrobat, and Event.PRE_SAVE, Event.POST_SAVE, Event.PRE_PRINT and Event.POST_PRINT, which only occur in Acrobat 5.0 or newer viewers.
        Parameters:
        event - the event on which to perform the action
        action - the action to perform, or null to remove any current action
        Since:
        2.0
      • getAction

        public PDFAction getAction​(Event event)
        Return the action that's performed when the specified event occurs on the document, as set by setAction. If no action is specified for that event, return null
        Since:
        2.0
      • setJavaScript

        public void setJavaScript​(String javascript)
        Set the document-wide JavaScript. This JavaScript is executed when the document is first loaded - this is normally used to define functions and the like, in the same way as JavaScript defined in the <HEAD> of an HTML document.
        Parameters:
        javascript - the JavaScript to use for the entire document
        Since:
        1.1.23
        See Also:
        getJavaScript(), PDFAction.formJavaScript(java.lang.String)
      • getNamedActions

        public Map<String,​PDFAction> getNamedActions()

        Return a Map containing all the named actions in the PDF. Named actions (which must always be "GoTo" type actions) can be referenced from outside the PDF, which allows the document to be opened at a specific location. Here's how to do this:

        In the PDF, add the following code:
           pdf.getNamedActions().put("Myaction", PDFAction.goTo(somepage));
         
        Then in your HTML document, add the following code:
           <a href="http://www.mycompany.com/mypdf.pdf#Myaction">
         

        The Map returned from this method can be manipulated using the normal Map methods to add or delete actions. The only restrictions is that keys must always be String objects and values must always be PDFAction objects that jump to a location in the document, like those returned from one of the PDFAction.goTo... methods.

        Since:
        1.1.12
      • getEmbeddedFiles

        public Map<String,​EmbeddedFile> getEmbeddedFiles()

        Return a Map containing all the Embedded Files associated with this document. Note this method does not return files embedded by way of a AnnotationFile method - they must be accessed via that class in the usual way.

        The Map returned from this method can be manipulated using the normal Map methods to add or delete actions. The only restrictions is that keys must always be String objects and values must always be EmbeddedFile objects. As with any map, the keys must be unique - we recommend adding files using their filenames as keys, like so:

          EmbeddedFile file = new EmbeddedFile(new File("Attachment.txt"));
          pdf.getEmbeddedFiles().put(file.getName(), file);
         

        Since 2.26, an EmbeddedFile can be a Folder or a File - although Folders will only exist in a "Portfolio" PDF. This is a new datamodel in PDF 2.0, so it's a slightly awkward fit with the existing API.

        If this PDF contains folders, the returned Map will contain only the Files, not the intermediate Folders. However Folders can be added to this Map, and if they are the Collection will be properly reconciled before the PDF is saved.

        Since:
        2.6
        See Also:
        getPortfolio(), OutputProfile.Feature.EmbeddedFileWithoutAF, OutputProfile.Feature.AssociatedFileNotEmbedded
      • getPortfolio

        public Portfolio getPortfolio()
        Return the PDF portfolio, creating it if necessary.
        Since:
        2.26
      • setMetaData

        public void setMetaData​(String xmldata)

        Set the XMP Metadata associated with this document. Since 2.26 this method calls getXMP().read(new StringReader(xmldata == null ? "" : xmldata)). We strongly recommend using the getXMP() method and modifying the XMP directly rather than using this method.

        Parameters:
        xmldata - the XML data to embed into the document, or null to remove it.
        Since:
        1.1.12
        See Also:
        getXMP()
      • getXMP

        public XMP getXMP()
        Return the XMP metadata as an XMP object. For properly-formatted XMP, this new (2020) approach is a considerably improvement over the getMetaData() method, which dates from 2001. If the PDF contains metadata which cannot be parsed as an XMP object (for example if it's not valid XML, or if the XML doesn't meet the basic requirements of XMP) then this method returns an XMP object which has XMP.isValid() == false (between 2.24.4 and 2.26 it returned null).
        Returns:
        the XMP, which may be empty or invalid but wil never be null
        Since:
        2.24.4
      • getMetaData

        public Reader getMetaData()
                           throws IOException

        Return any XML metadata associated with the document. Since 2.26 this simply returns getXMP().isEmpty() ? null : new StringReader(getXMP().toString()). It is strongly recommended that any code migrates to using the getXMP() method.

        Since 2.24.3, the returned type is guaranteed to hava a toString() method that will return the Metadata as a String.

        Returns:
        a Reader containing the source of the XML, with toString() guaranteed to be the value of the metadata as a string, or null if the XMP is empty or missing
        Throws:
        IOException
        Since:
        1.1.12
        See Also:
        getXMP()
      • makePortfolio

        public void makePortfolio​(boolean portfolio)
        Deprecated.
        call #getPortfolio instead

        Convert the PDF to (or from) simple Portfolio PDF. The files to include should be added to the getEmbeddedFiles() Map, and content may optionally be written to the (single) page in this PDF which will be displayed by any PDF viewer other than Acrobat. Note that most of the fancy layout options available in Acrobat for Portfolios are implemented with Flash, and are not supported here by the PDF API.

        In Acrobat X and later, files may be emnbdded into subfolders. We support this by the EmbeddedFile.setPortfolioFolder(java.lang.String) method, but as folders are implemented in a very awkward way in the PDF object this must be set before the file is added to the EmbeddedFiles map. Attempting to modify the folder after the file is added will result in an exception.

        Here is an extremely simple example showing how to create a Portfolio with one file in a subfolder.

         PDF pdf = new PDF();
         PDFPage page = pdf.newPage("A4");
         pdf.makePortfolio(true);
         Map files = pdf.getEmbeddedFiles();
         EmbeddedFile ef = new EmbeddedFile(new File("file1.pdf"));
         ef.setPortfolioFolder("subfolder");
         files.put("File 1", ef);
         pdf.render(new FileOutputStream("portfolio.pdf"));
         
        Parameters:
        portfolio - true to convert the PDF to a "portfolio" PDF, false to reverse this: the PDF will be a plain PDF with some attachments.
        Since:
        2.14.1
        See Also:
        getEmbeddedFiles(), EmbeddedFile.setPortfolioFolder(java.lang.String)
      • setOption

        public void setOption​(String key,
                              Object value)

        Set various options and on the PDF, which largely (but not necessarily) follows the options available in the "Document Properties" dialog of Acrobat. The key is a case-insensitive String and the value is an object - it may be String, Boolean, Integer or some other type.

        Passing in an unrecognised key or an invalid value as a parameter will not throw an exception, but will simply have no effect. The list of currently supported options is below.

        view.fullscreenbooleanOpen the document in full-screen mode
        view.displayDocTitlebooleanThe window's title bar should display the document title taken from the the Title entry of the getInfo() map. If false the title bar should display the filename instead (only works in Acrobat 5 and later)
        view.hideToolBarbooleanHide the viewer application's tool bars when the document is active
        view.hideMenuBarbooleanHide the viewer application's menu bar when the document is active
        view.hideWindowUIbooleanHide user interface elements in the Document window (such as scroll bars and navigation controls), leaving only the document's contents displayed
        view.fitWindowbooleanResize the document's window to fit the size of the first displayed page. Note this resizes the window to fit the document, not the other way round
        view.centerWindowbooleanPosition the document's window in the center of the screen. Note this moves the whole viewer to the center of the screen, not the document to the center of the viewer
        rtlbooleanSet the reading direction for this document - true will set it as "right to left", false to the default of left-to-right. Note that setting the Locale will automatically set this value to an appropriate value.
        pagelayoutstringHow pages are displayed in the main Acrobat window pane. Values are typically SinglePage (the default), OneColumn ("Single Page Continuous" in Acrobat 8), TwoColumnLeft ("Two-Up Continuous (Facing)" in Acrobat 8), TwoColumnRight ("Two-Up Continuous (Cover Page)" in Acrobat 8), TwoPageLeft ("Two-Up (Facing)" in Acrobat 8) or TwoPageRight ("Two-Up (Cover Page)" in Acrobat 8). Other values are possibile but won't be recognised by Acrobat
        pagemodestringWhat to display in the left-most pane of the Acrobat window. Values are typically UseNone (the default), which prevents the left-pane from being displayed, or UseOutlines to display Bookmarks, UseThumbs to display the Page Thumbnails, UseOC to show the Layers tab or UseAttachments to show the Attachments tab. The value "UseSignatures" can also be used to set the initial panel to the Signature panel in the BFO PDF Viewer, although this value has no effect in Acrobat
        view.areastringWhich page box to display when viewing the document on screen. One of CropBox (the default), MediaBox, TrimBox, BleedBox or ArtBox. Typically this setting is best left unchanged
        view.clipstringWhich page box to clip the page contents to when viewing the document on screen. One of CropBox (the default), MediaBox, TrimBox, BleedBox or ArtBox. Typically this setting is best left unchanged
        print.areastringWhich page box to display when printing the document. One of CropBox (the default), MediaBox, TrimBox, BleedBox or ArtBox. Typically this setting is best left unchanged
        print.clipstringWhich page box to clip the page contents to when printing the document. One of CropBox (the default), MediaBox, TrimBox, BleedBox or ArtBox. Typically this setting is best left unchanged
        print.scalingstringHow to scale the document when printed. One of AppDefault (which uses the application defaults) or None (for no scaling). Some non-standard values are also recognized by our viewer, including Fit (scale the page up or down to fit the printable area, preserving the aspect ratio), FitUnlocked (as before but don't preserve the aspect ratio), ShrinkToFit and ShrinkToFitUnlocked (as for Fit and FitUnlocked, but only scale down to fit on the page, not up).
        print.duplexstringWhat to set the print duplex settings to in the Acrobat Print Dialog. One of Simplex (the default), DuplexFlipLongEdge to duplex print on the long edge, or DuplexFlipShortEdge to duplex print on the short edge.
        print.matchtraysizestringWhether to attempt to match the paper source to the page size.
        print.numcopiesintegerThe number of copies to set in the print dialog, from 1 to 5.
        print.pagerangeListWhich pages to set as the default pages to print in the Print dialog. Specified as a java.util.List containing PDFPage objects.
        bfo.printasimagebooleanForce the PDF to be printed as an image when printing with the BFO API only. This option may rarely be needed to print some documents correctly on some JVMs. It will be ignored by non-BFO applications
        markedbooleanIdentify the PDF as containing marked content (since 2.16)
        Parameters:
        key - a case-insensitive key determining the option to set - may not be null
        value - the value to set that key to. The type depends on the key, but in general a value of null means the default.
        Since:
        2.7.6
      • getOption

        public Object getOption​(String key)
        Returns the current value of an option, as set by setOption(). Boolean values will return "true" or null.
        Parameters:
        key - a case-insensitive key determining the option to set - may not be null
        Since:
        2.7.6
      • getBookmarks

        public List<PDFBookmark> getBookmarks()
        Return the List of bookmarks at the top level of the document. The List contains zero or more PDFBookmark objects, and can be altered using any of the standard List methods to order the documents bookmarks in any way you see fit. New documents start with an empty list.
        Returns:
        the List of bookmarks at the top level of the document
        Since:
        1.0
        See Also:
        PDFBookmark
      • getDocumentID

        public String getDocumentID​(boolean primary)

        Returns a String representing this documents unique ID. The PDF specification recommends (but not requires) that every document is given a unique ID when it's created which is stored in two parts. The primary ID stays constant throughout the life of the document, the secondary should be updated on every revision - although in the first revision of a document they should be the same. So when comparing the IDs of two documents, if the primary and secondary both match you've found the same document, and when only the primary ID matches you've found a different version of the same document.

        This method return either the primary or secondary ID, depending on whether the primary parameter is true or false. The ID is generally just random characters.

        Calling this method before the document is created (ie when you've just created a new PDF but not called render()) will result in this method returning null. It may also return null for PDFs that do not have an ID specified, although they are fairly rare these days.

        Although the IDs are stored internally as 16 bytes, we return a String of 32 hex-characters to make them easier to display and compare.

        Parameters:
        primary - whether to return the primary or secondary ID
        Returns:
        a 32-character String representing the ID, or null if no ID is set
        Since:
        2.1.2
      • getForm

        public Form getForm()
        Return the Interactive Form or "AcroForm" object which is part of each PDF document. Note that using interactive forms requires the "Extended Edition" of the library - although the classes are supplied with the package an "Extended Edition" license must be purchased to activate this functionality.
        Returns:
        the documents AcroForm
        Since:
        1.1.13
      • importFDF

        public void importFDF​(FDF fdf)

        Import the contents of the specified FDF into the PDF document. Any form values specified in the FDF file will be used to set the corresponding form fields in the PDF, and since 2.2.2 any annotations in the FDF will be imported as well. If a field doesn't exist, a warning is printed and the field is ignored.

        Note that since 2.11.18 any JavaScript on the FDF will be imported as well, and this may involve executing JavaScript with the permissions of the PDF class. See the FDF.willExecuteJavaScript() method and the FDF.setJavaScript(java.lang.String, java.lang.String) method to disable this.

        Since:
        1.2.1
      • render

        public void render​(OutputStream out)
                    throws IOException

        This method renders the completed PDF to an OutputStream. The stream is left open on completion. A document may be rendered more than once.

        Rendering the document typically merges all the revisions of a document, so after rendering the getNumberOfRevisions() method will always return zero. The exception to this is documents containing an existing digital signature, or documents with an OutputProfile requiring the OutputProfile.Feature.MultipleRevisions feature; there is a very specific technical case where this may be necessary, see the API docs on that class for more information.

        Parameters:
        out - the output stream to write the PDF to
        Throws:
        IOException - if the process could not be completed
        Since:
        1.0
      • getRenderProgress

        public float getRenderProgress()
        Get the progress of the render() method running in a different thread. The returned value will start at 0 and move towards 1 as the render progresses.
        Since:
        2.8
      • setCache

        public static void setCache​(Cache cache)
        Set the Cache to be used by the library. Note this is a static, method, which means a single cache is used for all PDFs. This also means you do not need to call this method more than once, and doing so is not only inefficient, it could theoretically cause problems in multi-threaded environments like servlet engines. To repeat - if you are going to call this method, do it once in an initialization routine before the first PDF is created.
        Since:
        2.2.2
      • setPageLabel

        public void setPageLabel​(int startpage,
                                 int displaystart,
                                 String prefix,
                                 char type)

        Set the "Page Label" for a range of pages in the PDF - the way the page number is presented. Calling the method will set the format for all pages from the specified startpage to the end of the document, so if multiple formats are required they should be set in ascending order.

        For example, to set the first 4 pages to i, ii, iii, iv and then number normally from 1, call:

         pdf.setPageLabel(0, 1, null, 'r');  // Number all pages in roman starting from 1
         pdf.setPageLabel(4, 1, null, 'D');  // Number from 4th page in decimal starting from 1
         

        To reset the page labels the the default, call setPageLabel(0, 1, null, 'D'). This will number all pages as decimal numbers starting from 1.

        Parameters:
        startpage - the first page in the PDF to format with this label, starting from 0
        displaystart - the number to give the page specified by startpage - subsequent pages will be numbered sequentially from this value. Minimum value is 1
        type - one of 'D' for decimal, 'R' for upper-case roman, 'r' for lower-case roman, 'A' for upper-case letters, 'a' for lower-case letters or 'x' for no numbering - in this case just the prefix will be used.
        prefix - the prefix to give to the page labels, or null for no prefix
        Since:
        2.11.19
      • getPageLabel

        public String getPageLabel​(int num)
        Get the "Page Label" for the specified page number, or null if none is specified.
        Parameters:
        num - the page number to get the label for, starting with 0
        Since:
        2.11.19
        See Also:
        setPageLabel(int, int, java.lang.String, char)
      • setLicenseKey

        public static void setLicenseKey​(String key)

        Set the license key for the library. When the library is purchased, BFO supplies a key which removes the "DEMO" stamp on each of the documents.

        Please note this method is static - it should be called BEFORE the first PDF is created, like so:

          PDF.setLicenseKey(.....);
          PDF pdf = new PDF();
         
        Parameters:
        key - the license key
      • getLicensedProperty

        public static Object getLicensedProperty​(String key)
        Retrieve a property from the PDF License.
        Parameters:
        key - the property
        Since:
        2.26.3
      • useAWTEventModel

        public static void useAWTEventModel​(boolean awtevent)
        Set the PDF Library to work with the AWT event model. Without this flag set (the default) any PropertyChangeEvent objects fired by classes in this package will be fired immediately. If this flag is set to true, they will be batched up and fired from the AWT EventQueue at some point in the future. If the PDF Library is being used in an AWT application, especially one that may have background threads performing tasks, this value should be set to true.
        Since:
        2.12
      • getLoadState

        public LoadState getLoadState​(int index)
        For linearized documents that are being loaded from a URL via the PDFReader.setSource(URL), this method relays the current load state of the specified page. If the page is fully loaded this method returns null, otherwise it returns a LoadState which can be used to monitor the progress of the load.
        Parameters:
        index - the number of the page to query (0-indexed) - a value of -1 will check all pages, and return true only if they are all loaded.
        Returns:
        a LoadState describing the progress of the load, or null if the page is fully loaded or the PDF is not linearized.
        Since:
        2.14
      • getStructureTree

        public Document getStructureTree()

        Returns the Structure Tree for the entire document as a W3C DOM. This is a representation of the logical structure of the PDF, which is typically used to enable accessibility on the PDF.

        The returned Document is live, and changes made to it will be reflected in the PDF. By default the tree will not contain any text content. Populating the tree with text content is a relatively time-consuming operation for large documents, so is not done by default. The tree will contain <bfo:content> elements marking where the text-content will go. Those nodes will be populated if the extract-text DOM config parameter is set to true; see below.

        The special nodes in the bfo namespace have a fixed set of attributes which identify the current page, marked-content id and/or index into the page's annotation list of the item; the attribute are live and will update as pages are reordered or removed.

        Changes made indirectly to this Document (either by moving pages in and out of the document, or by calls to beginTag on PDFPage, PDFCanvas or LayoutBox) may not be reflected in the tree until the Document.normalizeDocument() method is called.

        The returned Document can be modified, although it it not possible to modify or create new text or <bfo:content> elements. Modification is useful when pages from multiple PDFs have been merged together, to rationalize the structure.

        There are various parameters that can be set on the Document before the Document.normalizeDocument() method is called, to control how the tree is modified. With the exception of role-map, roles, lexicons, class-map, and trim-empty, all values are Boolean and are set and retrieved like so:

         document.getDomConfig().setParameter("extract-text", true);
         Object o = document.getDomConfig().getParameter("extract-text");
         
        extract-text This value can be set to a Boolean; when true, the next normalization of the Document will extract any text that has not yet been extracted, and populate the <bfo:content> elements in the tree with text and <bfo:blob> elements which are (currently) placeholders for images or other graphical operations. Note that if the PDF is retrieved from PDFParser.getStructureTree(), you will get the same object, but with this parameter set to true by default.
        fix-invalid-xml The Document is a representation of an internal structure in the PDF, not an actual XML Document. As such is may contain content which is not valid in XML, such as element or attribute names with spaces or other illegal characters. This isn't a problem unless you are trying to import a copy of this Document into a regular XML document. If that's the case, setting this value to true will replace any invalid characters in the tree with underscores.
        fix-structure This setting defaults to true. If there are any restrictions in the OutputProfile that would cause rendering to fail, if this flag is true an attempt to repair the tree will be made. For example, in PDF/UA-1, weak headings (e.g. H1, H2, H3 elements) are required to descend consecutively - H3 must follow H2, not H1). If the Document fails to meet this requirement and fix-structure is set to true, the headings will be renumbered to meet this requirement. Note since 2.28.5 this value is a String - as well as true or false, it can be a space-separated list of things to repair. Specific values currently include table list ruby warichu inline caption heading math root unknown bubble alt list-numbering for different classes of repair, mostly to the hierarchy - for example, table means create table elements as required to fix the hierarchy, and bubble means move block/group elements up in the tree until they have a valid parent.
        trim-empty Documents that have seen pages removed will tend to accumulate empty elements, if the content within those elements was on the removed pages. Setting this property to the String "always" or Boolean true will delete elements with no content descendants that are considered "safe"; this is most elements except those that denote structure, like >td<. Setting this value to the String "move" (the default) will move empty elements along with their siblings if pages are moved to/from a PDF. Setting this value to "none" or Boolean false will leave empty elements unmodified (which was the default behaviour to 2.24.4)
        role-map In PDF, it is possible to "map" one type of element name to another. This allows custom elements to be created without breaking the validation rules; for example, if <Foo> is mapped to <Td> then the structure <Table><Tr><Foo>... is perfectly valid. The mappings are specified by a Map<String,String> which is retrieved from the role-map parameter; unlike the other parameters this cannot be set, although the returned map can be modified. For the previous example you would do rolemap.put("Foo", "Td"). From 2.24.1 it is possible to include namespaces in both the keys and values to this map, by setting the name to uri + "\n" + localname. Names with no prefix are considered to be in the default namespace used by PDF 1.x. For example, in PDF 2.x the above example should be rolemap.put(NS + "\nFoo", NS + "\nTd"), where NS=https://www.iso.org/pdf2/ssn.
        roles New in 2.28.2, the roles user parameter is an array of namespaces to prioritise. As described for role-map, in PDF it is possible to map one type of element to another in a different namespace. These maps are transitive (an element can be mapped to several namespaces at once) which can get confusing. The roles list can be used to determine which view of the tags you want to take - empty by default, but if namespaces are added the element name and namespace will be rolemapped to the first matching namespace in the list. For example:
         List<String> roles = (List<String>)document.getDomConfig().getParameter("roles");
         roles.add("http://iso.org/ssn/pdf2");
         roles.add("http://iso.org/ssn/pdf");
         
        will ensure that if any elements in the tree are role-mapped to the PDF2 or PDF1 namespace, the role-mapped element names are returned instead. All other element names/namespaces are returned as normal
        class-map Each element in the Structure Tree may belong to one or more "classes". Belonging to a class means the element inherits the attributes defined on that class, although this feature seems to be rarely used. Since 2.24.1 this map of class attributes can be retrieved with the class-map parameter - the returned value is a Map<String,NamedNodeMap>.
        lexicons The Structure Tree may include one or more pronunciation dictionaries stored as PLS (Pronunciation Lexicon Specification 1.0) files. Since 2.24.4 a List<EmbeddedFile> can be retrieved with the lexicons parameter, and altered to add new lexicons if required.

        Since 2.26, normal elements and <bfo:content> elements can have XMP metadata and/or a set of EmbeddedFile objects associated with them, which may be set or retrieved by calling Element.getUserData("metadata") or Element.getUserData("files") respectively. The "metadata" value is set as an XMP, String or Reader and retrieved as a XMP. The "files" property is set as an EmbeddedFile or a collection of the same, and retrieved as a Collection<EmbeddedFile>. In both cases, the returned objects are live and changes to them will be reflected when the PDF is written out.

        The presence of each of these structured in the XMP is indicated by two special attributes, bfo:metadata and bfo:files. If these attributes exist on an element, it will have the corresponding structure present in the user data.

        Since 2.28.4, every DOM node has a special read-only "placement" userdata which can be retrieved. This is a Map<PDFPage,Shape> which gives the physical position of this node on the page(s). This is always set for PDFs that have been read in, but not guaranteed to be set for trees that are in the process of being constructed.

        Populating the Document with text content requires an Extended Edition plus Viewer license.

        Since:
        2.24
        See Also:
        PDFCanvas.beginTag(java.lang.String, java.util.Map<java.lang.String, java.lang.Object>), rebuildStructureTree(), PDFParser.getStructureTree(), OutputProfile.Feature.TaggedPDF
      • getOptionalContentLayers

        public List<OptionalContentLayer> getOptionalContentLayers()

        Return the list of OptionalContentLayer objects defined in the PDF. This list will be empty for a freshly created PDF, and any layers created by the user must be added in the order they're required. When an existing PDF has been loaded via a PDFReader, the first call to this method will populate the list with the current state from within the PDF. The list is live, and any changes made to it will be saved when the PDF is saved.

        Items may be added to the list more than once but later occurrances will be ignored. Clearing the list will remove all optional content from the PDF.

        Returns:
        the Optional Content list
        Since:
        2.23.5
      • putUserData

        public void putUserData​(String key,
                                Object value)
        Set a custom property on the PDF. The property will be saved with the file with the "BFOO_" prefix.
        Parameters:
        value - a CharSequence, Number, Date, Calendar, Boolean, byte[], or a List/Map of those values, or null to remove the property
        Since:
        2.24.2
      • getUserData

        public Object getUserData​(String key)
        Return a property previously set on the PDF with the putUserData() method
        Returns:
        a String, Boolean, Number, Calendar, byte[] or a Map/List of those values if found, or null if no such property exists.
        Since:
        2.24.2
      • getEmbeddedFileSource

        public EmbeddedFile getEmbeddedFileSource()
        When a PDF is loaded from EmbeddedFile.getPDF(), this method will return the EmbeddedFile that contains this object. Otherwise it will return null
        Since:
        2.26
      • getDocumentPart

        public DocumentPart getDocumentPart()
        Return the root DocumentPart, which will never be null but which will be empty unless this file uses DocumentParts
        Since:
        2.28.3
      • toString

        public String toString()
      • putLiteral

        public void putLiteral​(String key,
                               String tokens)
        Put a literal token sequnce. For debugging
        Parameters:
        key - the key
        tokens - the token sequence, eg "true" or "/foo" or "[/Foo/Bar]". No refs, just direct objects.