Class PDFReader


  • public final class PDFReader
    extends java.lang.Object

    The PDFReader class adds the ability to load an existing PDF to the library. Note that this class is part of the "Extended Edition" of the library - although it's supplied with the package an "extended edition" license must be purchased to activate this class.

    There are a number of constructors on this class which can be used to load a PDF from File and InputStream objects, but - while valid - these are all now wrappers on a more generalized approach. These methods will not be deprecated and existing code has no need to change, but for more flexiblity the new approach is recommended:

     PDFReader reader = new PDFReader();
     reader.setSource(source);
     reader.addEncryptionHandler(handler); // optional
     reader.load();
     PDF pdf = new PDF(reader);
     

    Revisions

    A PDF file may sometimes contain different versions of itself in the one file. These "revisions" show how the state of the document has changed over time. For most purposes this information isn't terribly useful - prior to version 1.2.1 only the latest revision was used - but they do play an important role when using Digital Signatures.

    When a signature is applied to the file, the current revision is locked and any further changes to the file result in a new revision being made. With Adobe Acrobat several signatures can be applied, each one of which will result in a new revision.

    It's important to remember that changes can be made a document after it's been signed, and provided that they're made in a new revision, the signature won't be invalidated - but the signature won't cover the whole of the document either. When validating a signed document this needs to be taken into account

    Another interesting feature of revisions is that with a document with multiple revisions, it's possible to "roll back" to a previous version. This is done by passing in a specific revision number to the PDF(PDFReader,int) constructor - the PDF will be created as it was at the specified revision.

    Parallel Operation note: Since 2.10 this class will use multiple threads in parallel where possible. The number of threads defaults to the available processors but can be controlled by setting the Threads property (typically by setting the org.faceless.pdf2.Threads System property) to the number of threads required. Each thread requires only minimal heap so it's safe to run as many as you like.

    Since:
    1.1.12
    • Constructor Summary

      Constructors 
      Constructor Description
      PDFReader()
      Create a new PDFReader.
      PDFReader​(java.io.File in)
      Read an unencrypted PDF from the specified file.
      PDFReader​(java.io.File in, java.lang.String password)
      Read an encrypted PDF from the specified File.
      PDFReader​(java.io.File in, EncryptionHandler encrypt)
      Read an encrypted PDF from the specified File.
      PDFReader​(java.io.File in, EncryptionHandler[] encryptlist, float[] progress)
      Read a PDF from the specified File, and report on progress.
      PDFReader​(java.io.File in, EncryptionHandler encrypt, float[] progress)
      Read a PDF from the specified InputStream, and report on progress.
      PDFReader​(java.io.InputStream in)
      Read an encrypted PDF from the specified InputStream.
      PDFReader​(java.io.InputStream in, java.lang.String password)
      Read an encrypted PDF from the specified InputStream.
      PDFReader​(java.io.InputStream in, EncryptionHandler encrypt)
      Read an encrypted PDF from the specified InputStream.
      PDFReader​(java.io.InputStream in, EncryptionHandler[] encryptlist, float[] progress)
      Read a PDF from the specified InputStream, and report on progress.
      PDFReader​(java.io.InputStream in, EncryptionHandler encrypt, float[] progress)
      Read a PDF from the specified InputStream, and report on progress.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void addEncryptionHandler​(EncryptionHandler handler)
      Add an EncryptionHandler to be tried when loading the PDF.
      int getNumberOfRevisions()
      Return the number of revisions that have been made to this file.
      int getPDFVersion()
      Return the PDF version number declared in the file header.
      float getProgress()
      Return the progress of the load, from 0 to 1
      void load()
      Load the PDF from the specified source (set by setSource(java.io.File), which must be called before this method).
      void setLinearizedLoader​(boolean linearizer)
      Set whether to use the Linearization tables (if they exist) in the PDF to load it on demand.
      void setSource​(java.io.File file)
      Set the source for this PDFReader to the specified File.
      void setSource​(java.io.InputStream in)
      Set the source for this PDFReader to the specified InputStream.
      void setSource​(java.net.URL url)
      Set the source for this PDFReader to the specified URL.
      void setSource​(java.net.URLConnection con)
      Set the source for this PDFReader to the specified URLConnection.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • PDFReader

        public PDFReader()
        Create a new PDFReader. Unlike the other constructors this method does not initiate the load immediately: the setSource(java.io.File) method and then the load() method must be called before this object is passed into the PDF(PDFReader) constructor.
        Since:
        2.14
      • PDFReader

        public PDFReader​(java.io.File in)
                  throws java.io.IOException
        Read an unencrypted PDF from the specified file.

        This legacy constructor is equivalent to the following code - see the API documentation for those methods for details:

          PDFReader reader = new PDFReader();
          reader.setSource(in);
          reader.load();
         
        Parameters:
        in - the File to read from
        Throws:
        java.io.IOException
      • PDFReader

        public PDFReader​(java.io.File in,
                         java.lang.String password)
                  throws java.io.IOException
        Read an encrypted PDF from the specified File. The PDF is encrypted using the StandardEncryptionHandler and the specified password.

        This legacy constructor is equivalent to the following code - see the API documentation for those methods for details:

          PDFReader reader = new PDFReader();
          reader.setSource(in);
          reader.addEncryptionHandler(new StandardEncryptionHandler(password));
          reader.load();
         
        Parameters:
        in - the File to read from
        password - the password needed to open the file, or null for no password
        Throws:
        java.io.IOException
      • PDFReader

        public PDFReader​(java.io.InputStream in)
                  throws java.io.IOException
        Read an encrypted PDF from the specified InputStream.

        This legacy constructor is equivalent to the following code - see the API documentation for those methods for details:

          PDFReader reader = new PDFReader();
          reader.setSource(in);
          reader.load();
         
        Parameters:
        in - the InputStream to read from
        Throws:
        java.io.IOException
      • PDFReader

        public PDFReader​(java.io.InputStream in,
                         java.lang.String password)
                  throws java.io.IOException
        Read an encrypted PDF from the specified InputStream. The PDF is encrypted using the StandardEncryptionHandler and the specified password.

        This legacy constructor is equivalent to the following code - see the API documentation for those methods for details:

          PDFReader reader = new PDFReader();
          reader.setSource(in);
          reader.addEncryptionHandler(new StandardEncryptionHandler(password));
          reader.load();
         
        Parameters:
        in - the InputStream to read from
        password - the password needed to open the file, or null for no password
        Throws:
        java.io.IOException
      • PDFReader

        public PDFReader​(java.io.InputStream in,
                         EncryptionHandler encrypt)
                  throws java.io.IOException
        Read an encrypted PDF from the specified InputStream.

        This legacy constructor is equivalent to the following code - see the API documentation for those methods for details:

          PDFReader reader = new PDFReader();
          reader.setSource(in);
          reader.addEncryptionHandler(encrypt);
          reader.load();
         
        Parameters:
        in - the InputStream to read from
        encrypt - the EncryptionHandler to decrypt the PDF, or null for no encryption
        Throws:
        java.io.IOException
        Since:
        2.0
      • PDFReader

        public PDFReader​(java.io.File in,
                         EncryptionHandler encrypt)
                  throws java.io.IOException
        Read an encrypted PDF from the specified File.

        This legacy constructor is equivalent to the following code - see the API documentation for those methods for details:

          PDFReader reader = new PDFReader();
          reader.setSource(in);
          reader.addEncryptionHandler(encrypt);
          reader.load();
         
        Parameters:
        in - the InputStream to read from
        encrypt - the EncryptionHandler to decrypt the PDF, or null for no encryption
        Throws:
        java.io.IOException
        Since:
        2.2.5
      • PDFReader

        public PDFReader​(java.io.InputStream in,
                         EncryptionHandler encrypt,
                         float[] progress)
                  throws java.io.IOException
        Read a PDF from the specified InputStream, and report on progress. The progress field is updated throughout the read with values from 0 to 1 indicating how much of the PDF has been read.

        This legacy constructor is equivalent to the following code - see the API documentation for those methods for details:

          PDFReader reader = new PDFReader();
          reader.setSource(in);
          reader.addEncryptionHandler(encrypt);
          reader.load();
         
        Parameters:
        in - the stream to read from
        encrypt - the EncryptionHandler to decrypt the PDF, or null for no encryption
        progress - an optional array one item long, the first parameter of which will by updated throughout the read
        Throws:
        java.io.IOException
        Since:
        2.8
      • PDFReader

        public PDFReader​(java.io.File in,
                         EncryptionHandler encrypt,
                         float[] progress)
                  throws java.io.IOException
        Read a PDF from the specified InputStream, and report on progress. The progress field is updated throughout the read with values from 0 to 1 indicating how much of the PDF has been read.

        This legacy constructor is equivalent to the following code - see the API documentation for those methods for details:

          PDFReader reader = new PDFReader();
          reader.setSource(in);
          reader.addEncryptionHandler(encrypt);
          reader.load();
         
        Parameters:
        in - the File to read from
        encrypt - the EncryptionHandler to decrypt the PDF, or null for no encryption
        progress - an optional array one item long, the first parameter of which will by updated throughout the read
        Throws:
        java.io.IOException
        Since:
        2.8
      • PDFReader

        public PDFReader​(java.io.InputStream in,
                         EncryptionHandler[] encryptlist,
                         float[] progress)
                  throws java.io.IOException
        Read a PDF from the specified InputStream, and report on progress. The progress field is updated throughout the read with values from 0 to 1 indicating how much of the PDF has been read.

        This legacy constructor is equivalent to the following code - see the API documentation for those methods for details:

          PDFReader reader = new PDFReader();
          reader.setSource(in);
          reader.addEncryptionHandler(...encryptlist entries...);
          reader.load();
         
        Parameters:
        in - the InputStream to read from
        encryptlist - the list of possible EncryptionHandlers to decrypt the PDF, or null for no encryption
        progress - an optional array one item long, the first parameter of which will by updated throughout the read
        Throws:
        java.io.IOException
        Since:
        2.8.2
      • PDFReader

        public PDFReader​(java.io.File in,
                         EncryptionHandler[] encryptlist,
                         float[] progress)
                  throws java.io.IOException
        Read a PDF from the specified File, and report on progress. The progress field is updated throughout the read with values from 0 to 1 indicating how much of the PDF has been read.

        This legacy constructor is equivalent to the following code - see the API documentation for those methods for details:

          PDFReader reader = new PDFReader();
          reader.setSource(in);
          reader.addEncryptionHandler(...encryptlist entries...);
          reader.load();
         
        Parameters:
        in - the File to read from
        encryptlist - the list of possible EncryptionHandlers to decrypt the PDF, or null for no encryption
        progress - an optional array one item long, the first parameter of which will by updated throughout the read
        Throws:
        java.io.IOException
        Since:
        2.8.2
    • Method Detail

      • setSource

        public void setSource​(java.io.File file)
                       throws java.io.IOException
        Set the source for this PDFReader to the specified File. The PDF will not be loaded into memory, so the file will be used as a backing store and must remain in existance and unchanged for the life of the PDF created from this PDFReader.
        Throws:
        java.io.IOException
        Since:
        2.14
        See Also:
        setSource(InputStream), setSource(URL), setSource(URLConnection)
      • setSource

        public void setSource​(java.io.InputStream in)
                       throws java.io.IOException
        Set the source for this PDFReader to the specified InputStream. The stream will be fully loaded into memory, and closed automatically when this is complete.
        Throws:
        java.io.IOException
        Since:
        2.14
        See Also:
        setSource(File), setSource(URL), setSource(URLConnection)
      • setSource

        public void setSource​(java.net.URL url)
                       throws java.io.IOException
        Set the source for this PDFReader to the specified URL. If the URL uses the http or https scheme, the webserver supports the "Range" header and the PDF at that URL is Linearized, there will be an initial load, with the rest continuing in the background or on demand. The URL wil be opened and closed throughout this process, with possibly multiple requests at once. Consequently it should refer to a static resource which must remain in existance and unchanged for the life of the PDF created from this PDFReader.
        Throws:
        java.io.IOException
        Since:
        2.14
        See Also:
        setSource(File), setSource(InputStream), setSource(URLConnection), setLinearizedLoader(boolean)
      • setSource

        public void setSource​(java.net.URLConnection con)
                       throws java.io.IOException
        Set the source for this PDFReader to the specified URLConnection. If the URL uses the http or https schema, the server supports the "Range" header and the PDF on that URL is Linearized, there will be an initial load, with the rest continuing in the background or on demand. The URL wil be opened and closed throughout this process, with possibly multiple requests at once. Consequently it should refer to a static resource which must remain in existance and unchanged for the life of the PDF created from this PDFReader.
        Throws:
        java.io.IOException
        Since:
        2.21
        See Also:
        setSource(File), setSource(InputStream), setSource(URL), setLinearizedLoader(boolean)
      • setLinearizedLoader

        public void setLinearizedLoader​(boolean linearizer)
        Set whether to use the Linearization tables (if they exist) in the PDF to load it on demand. This defaults to true if the PDF source is set to a URL, false otherwise.
        Since:
        2.14
        See Also:
        setSource(URL)
      • addEncryptionHandler

        public void addEncryptionHandler​(EncryptionHandler handler)
        Add an EncryptionHandler to be tried when loading the PDF. Any items added here will be tried in order until one succeeds.
        Since:
        2.14
      • getProgress

        public float getProgress()
        Return the progress of the load, from 0 to 1
        Since:
        2.14
      • load

        public void load()
                  throws java.io.IOException
        Load the PDF from the specified source (set by setSource(java.io.File), which must be called before this method). The load method should be called before the PDFReader is passed to the PDF constructor - if it hasn't it will be called automatically at that point, but as the PDF constructor does not throw an IOException (for historical reasons), if one is thrown at that point it will be wrapped in an IllegalStateException.
        Throws:
        java.io.IOException
        Since:
        2.14
      • getNumberOfRevisions

        public int getNumberOfRevisions()

        Return the number of revisions that have been made to this file. Earlier revisions of a PDF file can be loaded by passing a revision number less than this value to the appropriate PDF(PDFReader,int) constructor.

        Returns:
        the number of revisions, from 1 for an unmodified file with no subsequent changes
        Since:
        1.2.1
      • getPDFVersion

        public int getPDFVersion()
        Return the PDF version number declared in the file header.
        Since:
        2.28