Class OutputProfiler

  • All Implemented Interfaces:
    Runnable

    public class OutputProfiler
    extends Object
    implements Runnable

    An OutputProfiler is used to create an OutputProfile for a PDF or to attempt to apply a new OutputProfile, modifying the PDF in the process. This can be a basic OutputProfile, which is very quick to create, or a full OutputProfile which involves scanning the entire PDF, and takes much longer.

    This class now underlies the PDF.getBasicOutputProfile() and PDF.getFullOutputProfile() methods, and brings several advantages; you can re-run the profile when you know the PDF has changed, you can create the profile in one thread and monitor its progress in another, and you can make structural changes to the PDF (such as substituting fonts or colors) that aren't possible with the previous API.

    To create a new profile with the same information as PDF.getBasicOutputProfile():

     OutputProfiler profiler = new OutputProfiler(pdf);
     OutputProfile profile = profiler.getProfile();
     
    and to duplicate the functionality of the PDF.getFullOutputProfile() method:
     OutputProfiler profiler = new OutputProfiler(new PDFParser(pdf));
     OutputProfile profile = profiler.getProfile();
     
    To duplicate the functionality of the PDF.setOutputProfile(org.faceless.pdf2.OutputProfile) method, you would call apply(org.faceless.pdf2.OutputProfile). For example, to retrieve the full profile of the PDF, check if it's compatible with a "target" profile and attempt to convert the PDF to that profile if not:
     OutputProfiler profiler = new OutputProfiler(new PDFParser(pdf));
     OutputProfile profile = profiler.getProfile();
     OutputProfile.Feature[] list = profile.isCompatibleWith(target);
     if (list != null) {
         profiler.apply(target);
     }
     

    This is an oversimplified example, as typically converting a PDF to a profile (known as "preflighting") requires more information. The OutputProfiler class allows you to specify various actions to perform on the PDF when converting - if specified these will involve a rebuild of the entire document, which can be time- consuming.

    As an example, assume a PDF has an embedded font in it - this is not allowed in PDF/A. To try to convert the PDF to PDF/A-1b, you could run the following code:

     PDF pdf = new PDF(new PDFReader(new File("unembeddedfont.pdf")));
     OutputProfiler profiler = new OutputProfiler(new PDFParser(pdf));
     OutputProfile profile = profiler.getProfile();
     ColorSpace srgb = ColorSpace.getInstance(ColorSpace.CS_sRGB);
     OutputProfile target = new OutputProfile(OutputProfile.PDFA1b_2005);
     target.getOutputIntents().add(new OutputIntent("GTS_PDFA1", null, icc);
     OutputProfile.Feature[] list = profile.isCompatibleWith(target);
     if (list != null) {
         profiler.apply(target);  // This line will fail
     }
     

    This will fail with an IllegalStateException ("Denied Feature 'Unembedded TrueType Font' is set"). To fix this you need to set an action on the OutputProfiler before you apply the new profile. This will cause the PDF to be rebuilt internally. Here's how the above example could be modified to replace some, or all unembedded fonts with an embedded font from the OS.

     PDF pdf = new PDF(new PDFReader(new File("unembeddedfont.pdf")));
     OutputProfiler profiler = new OutputProfiler(new PDFParser(pdf));
     OutputProfile profile = profiler.getProfile();
     ColorSpace srgb = ColorSpace.getInstance(ColorSpace.CS_sRGB);
     OutputProfile target = new OutputProfile(OutputProfile.PDFA1b_2005);
     target.getOutputIntents().add(new OutputIntent("GTS_PDFA1", null, icc);
     OutputProfile.Feature[] list = profile.isCompatibleWith(target);
     if (list != null) {
         OutputProfiler.AutoEmbeddingFontAction fontaction = new OutputProfiler.AutoEmbeddingFontAction();
         fontaction.add(new OpenTypeFont(new FileInputStream("C:\\Windows\\Fonts\\arial.ttf"), 2));
         profiler.setFontAction(fontaction);
         profiler.apply(target);
     }
     
    We recommend you check our Blog for more on this topic.
    Since:
    2.18
    See Also:
    OutputProfile
    • Constructor Detail

      • OutputProfiler

        public OutputProfiler()
        Create a new OutputProfiler
      • OutputProfiler

        public OutputProfiler​(PDF pdf)
        Create a new OutputProfiler and call setPDF()
        Parameters:
        pdf - the PDF
      • OutputProfiler

        public OutputProfiler​(PDFParser parser)
        Create a new OutputProfiler and call setParser()
        Parameters:
        parser - the PDFParser
    • Method Detail

      • setPDF

        public void setPDF​(PDF pdf)
        Set the PDF to create the OutputProfile from. Setting just a PDF will allow only basic OutputProfile features to be extracted. Once set it cannot be changed.
        Parameters:
        pdf - the PDF to scan for features
        See Also:
        setParser(org.faceless.pdf2.PDFParser), setFull(boolean)
      • setParser

        public void setParser​(PDFParser parser)
        Set the PDFParser to create the OutputProfile from. Setting a PDFParser will allow both basic and full OutputProfile features to be extracted. Once set, it cannot be changed, but it can be reset by passing in null
        Parameters:
        parser - the PDFParser containing the PDF to scan for features
        See Also:
        setPDF(org.faceless.pdf2.PDF), setFull(boolean)
      • setFull

        public void setFull​(boolean full)
        Sets whether the OutputProfiler will create a full OutputProfile when it is run. This method simply creates a new PDFParser and calls setParser(org.faceless.pdf2.PDFParser)
        Parameters:
        full - whether to extract a full profile from the PDF.
      • setJustNoticeableDifference

        public void setJustNoticeableDifference​(float threshold,
                                                String methodHint)

        Set the threshold level at which two colors are considered "different", which is a criteria that is tested at various points throughout the apply(org.faceless.pdf2.OutputProfile) method. In particular, when two different Separations are found, they will be merged if the maximum Δe (delta-E) value for the two separations is less than this value. If greater than this value, the page will probably have to be rasterized.

        The methodHint can also be set to try and adjust the algorithm for determining Delta-E. Supported values are currently "CIDE2000" and "CIE94", or null for no change.

        The default values if not set are equivalent to setJustNoticeableDifference(2.5, "CIEDE2000"). Note that although the theoreticaly correct value for the JND threshold is 1, the alternative is rasterization. So a little tolerance here is probably justified.

        Parameters:
        threshold - the value to use for "just noticable difference" - two colors with a difference above this value are considered to be different colors
        methodHint - the method to use for deltaE calculation.
      • run

        public void run()

        Analyze the PDF and generate its profile. Whether this method calculates a "basic" or "full" profile depends on whether a PDFParser was specified on this class, either in the constructor or by calling setParser(org.faceless.pdf2.PDFParser). If available a full profile will be run, which can take some time. If not, a basic profile is generated which is essentially instantaneous.

        The process reads, but does not write to the structures of the PDF so can safely be run in parallel other operations that read the PDF, such as signature validation or rendering to bitmap.

        Specified by:
        run in interface Runnable
        See Also:
        isRunning(), getProfile(), apply(org.faceless.pdf2.OutputProfile)
      • cancel

        public void cancel()
        Cancel this OutputProfiler's operation - if it is being run in another thread, that thread should terminate safely shortly after this method is called. Once this object is cancelled, it cannot be restarted.
        See Also:
        isCancelled()
      • isCancelled

        public boolean isCancelled()
        Return true if the cancel() method has been called.
        See Also:
        isRunning()
      • getProfile

        public OutputProfile getProfile()
        Return the OutputProfile calculated by the run() method. If run() has not been called already, it will be called by this method. If it has already completed, it will return the result (or null if it failed). If it is currently running in another thread, this method will return null immediately.
        See Also:
        isRunning()
      • waitForProfile

        public OutputProfile waitForProfile()
        Wait for the profiling operation running in this (or another) thread to finish, and return the profile when done. This method will also wait if the profiling has not yet started.
        See Also:
        isRunning()
      • getProgress

        public float getProgress()
        Return the progress of the run() or apply(org.faceless.pdf2.OutputProfile) operation, or 0 if this is not being run, has completed or has been cancelled.
        Returns:
        the progress of the operation, from 0 to 1
        See Also:
        isRunning()
      • setHairlineWidth

        public void setHairlineWidth​(float width)
        If Hairlines or zero-width lines are denied when a new profile is applied, they will be changed to be lines of at least this width. This will rebuild the PDF. If no hairlines are present in the PDF when this method is called, no rebuild will be performed.
        Parameters:
        width - the width (in pts) to use to replace any hairlines. Must be > 0. The default is 0.2
      • setFontAction

        public void setFontAction​(OutputProfiler.FontAction action)
        Set the OutputProfiler.FontAction to run on the PDF. This can be used to replace fonts in the PDF with new fonts. If this value is not null, the PDF will be rebuilt in apply().
        Parameters:
        action - the FontAction
      • setImageAction

        public void setImageAction​(OutputProfiler.ImageAction action)
        Set the OutputProfiler.ImageAction to run on the PDF. This can be used to resample or recompress images colors in the PDF. If this value is not null, the PDF will be rebuilt in apply().
        Parameters:
        action - the ImageAction
        Since:
        2.22.2
      • setRasterizingActionExecutorService

        public void setRasterizingActionExecutorService​(ExecutorService service)
        Set the ExecutorService to be used for rasterizing pages pages with a OutputProfiler.RasterizingAction. A value of null means they are rasterized one at a time on the current thread (the default). Be aware that rasterizing is a memory intensive task, so to many threads will cause memory pressure.
        Since:
        2.26.1
      • getHairlineWidth

        public float getHairlineWidth()
        Return the hairline repair width, as set by setHairlineWidth(float).
        Since:
        2.26.1
      • setMaxImageDPI

        @Deprecated
        public void setMaxImageDPI​(OutputProfiler.ImageType imagetype,
                                   float threshold,
                                   float target)
        Set the maximum image resolution to be used in the PDF. If the PDF contains an image of the specified type which is not embedded at less than the specified threshold resolution, it will be resampled to the target resolution and replaced. Calling this method will cause the PDF to be rebuilt in apply().
        Parameters:
        imagetype - the ImageType whether this applies to one-bit, gray or color images
        target - the resolution to test the image against - all copies of the image embedded in the PDF must be this resolution or higher for it to be resampled.
        target - the resolution to resample the image to.
      • setStrategy

        public void setStrategy​(OutputProfiler.Strategy... strategy)
        Set the strategy that will be used to resolve problems encountered during apply(org.faceless.pdf2.OutputProfile). By default, the strategy is OutputProfiler.Strategy.Default, but multiple items can be passed into this method to define the set of strategies that will be tried when thereturnapply() method is called.
        Parameters:
        strategy - a list of strategies to apply
        Since:
        2.26
      • getStrategy

        public List<OutputProfiler.Strategy> getStrategy()
        Return a copy of the list of all strategies currently being applied.
        Since:
        2.26.3
      • apply

        public void apply​(OutputProfile targetprofile)

        Set the specified OutputProfile on the PDF. The supplied "target" profile will have a number of features denied and required, and this method will attempt to modify the PDF to match those requirements. If it's not possible then an IllegalStateException will be thrown.

        If the supplied profile references any features that require a full scan and the PDF has been loaded in (rather than create from scratch), then a full profile of the existing PDF must be run() to determine which features are currently set. If this is already in progress in another thread, this method will wait for it to complete. If it hasn't yet been started, it will be started on this thread by calling getProfile(). If no PDFParser has been set (in the constructor or through the setParser method) then a full profile cannot be created, and an IllegalStateException will be thrown.

        If a OutputProfiler.FontAction, OutputProfiler.ColorAction, OutputProfiler.ImageAction or OutputProfiler.RasterizingAction has been set on this class, an extra stage will be run which rebuilds the PDF content. It is also run if the full profile shows up any hairlines and the setHairlineWidth method was calling with a non-zero value.

        After this stage, or if no actions or hairline-replacement are specified, then the method will attempt to modify the PDF to add or remove required or denied features, as specified in the target profile. If that completes successfully, the OutputIntent on the target profile will be applied to the PDF and this method will complete.

        While this method is running the isRunning() method will return true, and the progress value returned from getProgress() will be updated, although the returned value is approximate at best: the amount of work required to modify a PDF to meet a target profile cannot realistically be predicted in advance. The cancel() method can be used to request the apply() method is interrupted. The PDF should be left in a consistent state if this happens, but that state will necessarily be somewhere between how the PDF was originally, and how it was going to be after modification. There is no way to revert the PDF to it's original state other than reloading. When this method finishes the isDone() method will return true, and the isCancelled() method will be false if the method completed successfully or threw an exception, and true if it was cancelled.

        Note that this method modifies the PDF extensively, so (unlike the retrieval of the OutputProfile from the run() method), any threads that read from the PDF must be paused while this method is running. The functionality to manage the progress of this method was added in 2.26.1

        Parameters:
        targetprofile - the OutputProfile that this PDF should be converted to match.
      • getArlingtonModelIssues

        public List<ArlingtonModelIssue> getArlingtonModelIssues()
        Traverse the PDF and generate a list of issues based on the Arlington PDF validation model. The list is recreated each time this method is called.
        Since:
        2.27.2
        See Also:
        ArlingtonModelIssue