Class Redactor


  • public class Redactor
    extends java.lang.Object
    The Redactor can be used to redact (completely remove) text and images from a PDF. This is quite simple to do - simply add a list of Area objects on each PDFPage, then call redact(). Here's an example showing how to remove any copies of the word "secret" from a PDF.
     PDF pdf = new PDF(new PDFReader(new File("private.pdf")));
     PDFParser parser = new PDFParser(pdf);
     Redactor redactor = new Redactor();
     for (int i=0;i<pdf.getNumberOfPages();i++) {
       PageExtractor extractor = parser.getPageExtractor(i);
       Collection all = extractor.getMatchingText("secret");
       for (Iterator j = all.iterator();j.hasNext();) {
         PageExtractor.Text text = (PageExtractor.Text)j.next();
         float[] corners = text.getCorners();
         GeneralPath path = new GeneralPath();
         path.moveTo(corners[0], corners[1]);
         path.lineTo(corners[2], corners[3]);
         path.lineTo(corners[4], corners[5]);
         path.lineTo(corners[6], corners[7]);
         Area area = new Area(path);
         redactor.addArea(text.getPage(), area);
       }
     }
     redactor.redact();
     pdf.render(new FileOutputStream("public.pdf"));
     

    Redaction potentially has to rebuild the PDF structure, so will lock on the PDF object itself before running and on each page before processing it.

    A digitally signed PDF cannot be redacted - the signature will cause the old version of the content to remain in the file for easy recovery. Consequently the redact() method will throw an IllegalStateException if it's tried. The signatures can be deleted manually, or you can do the following to remove them:

      OutputProfile p = new OutputProfile(OutputProfile.Default);
      p.setDenied(OutputProfile.Feature.DigitallySigned);
      pdf.setOutputProfile(p);
     

    Annotations will not be removed as part of the redaction process either - annotations are not really part of the page but live "above" it. While they cannot be partially redacted, the findAnnotations() method can be used to retrieve a list of all annotations partially covered by the redaction area and delete them - for example:

     List<PDFAnnotation> list = redactor.findAnnotations();
     for (int i=0;i<list.size();i++) {
         list.get(i).setPage(null);
     }
     

    Finally, note that redaction only deals with the page content - there may be sensitive information in annotations (which can have a Text Value), form fields (which may have multiple annotations to represent them in the page, or even no annotations at all), metadata and so on.

    Since:
    2.11.8
    • Constructor Detail

      • Redactor

        public Redactor()
        Creates a new Redactor
    • Method Detail

      • setRedactionMode

        public void setRedactionMode​(int mode)
        Sets the redaction mode. This is a bitwise-or of one or more of the various REDACT_ flags. The default is REDACT_ALL, which will redact all content in the specified area. A value which does not set any valid modes will throw an IllegalArgumentException
        Parameters:
        mode - a bitwise-or of one or more of the various REDACT_ bitmasks
        Since:
        2.18.3
        See Also:
        REDACT_ALL
      • addArea

        public void addArea​(PDFPage page,
                            java.awt.geom.Area area)
        Adds an area to redact out of the document.
        Parameters:
        page - the page to redact the area from
        area - the area to redact (in PDF page coordinates)
      • addStructuredElementByID

        public void addStructuredElementByID​(PDFPage page,
                                             java.lang.String id)
        Adds a marked content ID to redact out of the document.
        Parameters:
        page - the page to redact from
        id - the ID of the marked content to redact
      • setRedactionColor

        public void setRedactionColor​(java.awt.Paint color)
        Sets the Paint to use to fill in any redacted areas. The paint can be a solid color, it can be new Color(0, true) (a transparent color) to erase the content so the background can be seen, or it could be a PDFPattern to redact with a "stamp".
        Parameters:
        color - the color to redact with
      • redact

        public void redact()
                    throws java.io.IOException
        Performs the redaction. Page content streams are normalised as part of this operation.
        Throws:
        java.io.IOException
      • contractAreaAlongBaseline

        public static void contractAreaAlongBaseline​(float[] corners,
                                                     float amount)
        When redacting individual words in the middle of a line of kerned text, an additional character on either side may also be redacted. This is due to kerning - the neighbouring characters partially overlap the requested area and so are removed as well. One solution is to "contract" the area for redaction slightly along the baseline of the text: the original characters will fall partially inside this reduced area so will still be redacted, but the neighbouring characters will fall outside. This method is a convenience method to adjust the corners returned from PageExtractor.Text.getCorners() by an amount in this way, to allow for this effect.
        Parameters:
        corners - an array of n*8 coordinates of the form [x1,y1,x2,y2,x3,y3,x4,x4], specifying the clockwise outline of the text where the baseline of the text runs from (x1,y1) to (x4,y4). This array will be modified.
        amount - what proportion of the height of the line to trim from each end of the line. Typically this value would be about 0.1 to 0.2.
        Since:
        2.13
      • findAnnotations

        public java.util.List<PDFAnnotation> findAnnotations()
        Return a List of PDFAnnotation objects that fall partially inside the area being redacted. Annotations are not redacted as part of the main process, but by deleting every annotation in this list any that are in that area will be removed as well - for example:
         List<PDFAnnotation> list = redactor.findAnnotations();
         for (int i=0;i<list.size();i++) {
             list.get(i).setPage(null);
         }
         
        Since:
        2.16