Interface ArlingtonModelIssue


  • public interface ArlingtonModelIssue

    This interface represents an "issue" reported by comparing a PDF against the Arlington Model, a formal description of the PDF file format described at https://github.com/pdf-association/arlington-pdf-model.

    Arlington Model validation is a bit like "spell checking" a PDF. It compares each PDF object against the list of requirements in the specification and reports where the file deviates from the specification. In the majority of cases the deviations found will be fairly inconsequential, as errors important enough to cause a noticeable problem tend to get fixed.

    By contrast, the OutputProfile class and the OutputProfiler.apply(org.faceless.pdf2.OutputProfile) method can also be used to repair damage to a PDF, but tends to focus on the bigger problems - damaged fonts, damaged structures and so on; the kind of damage that tends to be harder to diagnose and repair, and to cause visible problems.

    For this reason "Arlington Model" issues and "OutputProfile" issues are entirely indepdent and can both be used to validate and repair a PDF. Most PDF products create files that somehow fail to match the Arlington model, including this API prior to release 2.27.2. Repairing these issues where possible is a good idea, is very lightweight compared to the OutputProfiler approach.

    Note: the Arlington Model is still under active development, and there will be some false positives identified (although none of those will be repairable).

    Usage

    To run the Arlington Model against a PDF, something like the following is what we'd expect to be fairly typical.

      OutputProfiler profiler = new OutputProfiler(pdf);
      List<ArlingtonModelIssue> list = profiler.getArlingtonModelIssues();
      for (ArlingtonModelIssue issue : list) {
        if (issue.getRepairType() != null) {
          issue.repair(null);
        } else {
          System.out.println("Can't repair: " + issue);
        }
      }
     
    Since:
    2.27.2
    See Also:
    OutputProfiler.getArlingtonModelIssues()
    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      Object getChildValue()
      Return the "child" object of this issue.
      int getIndex()
      Return the index into the parent object that caused the warning that was being processed, if the parent object is an array.
      String getKey()
      Return the key within the parent object that caused the warning that was being processed, if the parent object is a dictionary or stream.
      String getMessage()
      Return the message associated with this Result
      Object getParentValue()
      Return the "parent" object of this issue.
      String getPath()
      Return the "PDF path" to the object that caused the warning.
      PDF getPDF()
      Return the PDF this Result applies to
      String getRepairType()
      Return a brief description of the Repair that will be made if this issue is repaired by calling repair(java.lang.Object), or null if no repair is possible.
      String getRepairWarning()
      If there is anything to consider before applying a Repair, this method will return a textual description of the implications.
      String getTable()
      Return the name of the Table in the Model that caused the warning, or null of no Table applied
      String getVersionSuggestion()
      If the error could potentially be fixed by increasing the version number of the PDF, return the minimum version that would be required.
      boolean isDeprecation()
      Return true if this issue is because a property is used in the PDF that has been deprecated in version specified in the PDF.
      boolean isError()
      Return true if this issue is an "Error" - a condition is described in the model, and this item fails to meet that condition.
      boolean isFromLaterVersion()
      Return true if this issue is because a property is used in the PDF that is first described in a later version of the PDF specification.
      boolean repair​(Object o)
      Attempt to repair the issue.
    • Method Detail

      • getPDF

        PDF getPDF()
        Return the PDF this Result applies to
      • getPath

        String getPath()
        Return the "PDF path" to the object that caused the warning. This is always the "parent" object - for example, if a value in a dictionary is incorrect, it's the path to the dictionary, not the value. It may not be the shortest path, and it is quite possible for the same object to be returned in two different objects with two different paths and models.
        Returns:
        the PDF path to the object this issues was identified on
      • getTable

        String getTable()
        Return the name of the Table in the Model that caused the warning, or null of no Table applied
        Returns:
        the Arlington Model Table name
      • getKey

        String getKey()
        Return the key within the parent object that caused the warning that was being processed, if the parent object is a dictionary or stream. Otherwise returns null.
        Returns:
        the key in the parent object, or null
      • getIndex

        int getIndex()
        Return the index into the parent object that caused the warning that was being processed, if the parent object is an array. Otherwise returns -1
        Returns:
        the index in the parent object, or -1
      • getMessage

        String getMessage()
        Return the message associated with this Result
        Returns:
        the message
      • getVersionSuggestion

        String getVersionSuggestion()
        If the error could potentially be fixed by increasing the version number of the PDF, return the minimum version that would be required. Returned values are usually of the form "1.4", "1.7", "2.0" but may also be of the form "1.7e3", "1.7e11". Other return values may be used in the future to indicate extensions to PDF.
        Returns:
        the version suggestion, or null if not applicable
      • isError

        boolean isError()
        Return true if this issue is an "Error" - a condition is described in the model, and this item fails to meet that condition. If the error is that a property uses a value that is only allowed in a later version, the recommended version number will be returned from getVersionSuggestion() and repairing this issue will simply result in the version of the PDF being increased.
        Returns:
        true if this issue is an error
      • isFromLaterVersion

        boolean isFromLaterVersion()
        Return true if this issue is because a property is used in the PDF that is first described in a later version of the PDF specification. It's not an error condition, because undefined fields are always allowed. However for the field to be semantically correct, the PDF version number should be increased. Issues of this type can always be repaired; doing so will increase the version number of the PDF.
        Returns:
        true if this issue is due to a property from a later PDF version
      • isDeprecation

        boolean isDeprecation()
        Return true if this issue is because a property is used in the PDF that has been deprecated in version specified in the PDF. Deprecated fields can always be safely removed; repairing an issue of this type will remove them. However there's no real need to do so.
        Returns:
        true if this issue is due to a deprecated property
      • getParentValue

        Object getParentValue()
        Return the "parent" object of this issue. For example, if the issue was a value of incorrect type, the "parent" object is the dictionary containing that value, and the "child" object is the value itself. The type of the returned object most likely one of the internal PDF model types; therefore this method is of limited public use.
        Returns:
        the object the issue was found in
      • getChildValue

        Object getChildValue()
        Return the "child" object of this issue. For example, if the issue was a value of incorrect type, the "parent" object is the dictionary containing that value, and the "child" object is the value itself. If the issue is that the value is missing, this value will be null The type of the returned object most likely one of the internal PDF model types; therefore this method is of limited public use.
        Returns:
        the child of the parent value relating to this issue
      • getRepairType

        String getRepairType()
        Return a brief description of the Repair that will be made if this issue is repaired by calling repair(java.lang.Object), or null if no repair is possible. The returned value can act as an identifier for this particular repair type.
        Returns:
        a brief description of the repair, or null if it's not repairable
      • getRepairWarning

        String getRepairWarning()
        If there is anything to consider before applying a Repair, this method will return a textual description of the implications. If the Repair is 100% straightforward and with no side effects, this method returns null
        Returns:
        a warning relating to the repair, or null if no warning is required
      • repair

        boolean repair​(Object o)

        Attempt to repair the issue. If getRepairType() is null, or the repair fails for any reason, this method returns false; otherwise it returns true to indicate that the PDF has been repaired.

        Some types of repair may allow a suggested value or operation - if so suggestions will be made in the getRepairWarning() method describing the value to pass in to this method. But in most cases, the value should be null.

        Parameters:
        o - an optional object that may be used to influence the repair, or null to use the defaults.
        Returns:
        true if the repair succeeded