Class Redactor
- java.lang.Object
-
- org.faceless.pdf2.Redactor
-
public class Redactor extends Object
The Redactor can be used to redact (completely remove) text and images from a PDF. This is quite simple to do - simply add a list ofArea
objects on eachPDFPage
, then callredact()
. Here's an example showing how to remove any copies of the word "secret" from a PDF.PDF pdf = new PDF(new PDFReader(new File("private.pdf"))); PDFParser parser = new PDFParser(pdf); Redactor redactor = new Redactor(); for (int i=0;i<pdf.getNumberOfPages();i++) { PageExtractor extractor = parser.getPageExtractor(i); Collection all = extractor.getMatchingText("secret"); for (Iterator j = all.iterator();j.hasNext();) { PageExtractor.Text text = (PageExtractor.Text)j.next(); float[] corners = text.getCorners(); GeneralPath path = new GeneralPath(); path.moveTo(corners[0], corners[1]); path.lineTo(corners[2], corners[3]); path.lineTo(corners[4], corners[5]); path.lineTo(corners[6], corners[7]); Area area = new Area(path); redactor.addArea(text.getPage(), area); } } redactor.redact(); pdf.render(new FileOutputStream("public.pdf"));
Redaction potentially has to rebuild the PDF structure, so will lock on the PDF object itself before running and on each page before processing it.
A digitally signed PDF cannot be redacted - the signature will cause the old version of the content to remain in the file for easy recovery. Consequently the
redact()
method will throw anIllegalStateException
if it's tried. The signatures can be deleted manually, or you can do the following to remove them:OutputProfile p = new OutputProfile(OutputProfile.Default); p.setDenied(OutputProfile.Feature.DigitallySigned); pdf.setOutputProfile(p);
Annotations will not be removed as part of the redaction process either - annotations are not really part of the page but live "above" it. While they cannot be partially redacted, the
findAnnotations()
method can be used to retrieve a list of all annotations partially covered by the redaction area and delete them - for example:List<PDFAnnotation> list = redactor.findAnnotations(); for (int i=0;i<list.size();i++) { list.get(i).setPage(null); }
Finally, note that redaction only deals with the page content - there may be sensitive information in annotations (which can have a Text Value), form fields (which may have multiple annotations to represent them in the page, or even no annotations at all), metadata and so on.
- Since:
- 2.11.8
-
-
Field Summary
Fields Modifier and Type Field Description static int
REDACT_ALL
A flag that can be passed untosetRedactionMode(int)
to redact everything.static int
REDACT_BITMAP
A flag that can be passed untosetRedactionMode(int)
to redact bitmap image content.static int
REDACT_GRAPHICS
A flag that can be passed untosetRedactionMode(int)
to redact all non-text content.static int
REDACT_TEXT
A flag that can be passed untosetRedactionMode(int)
to redact text content.static int
REDACT_VECTOR
A flag that can be passed untosetRedactionMode(int)
to redact vector drawing content.
-
Constructor Summary
Constructors Constructor Description Redactor()
Creates a new Redactor
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addArea(PDFPage page, Area area)
Adds an area to redact out of the document.void
addStructuredElementByID(PDFPage page, String id)
Adds a marked content ID to redact out of the document.static void
contractAreaAlongBaseline(float[] corners, float amount)
When redacting individual words in the middle of a line of kerned text, an additional character on either side may also be redacted.List<PDFAnnotation>
findAnnotations()
Return a List ofPDFAnnotation
objects that fall partially inside the area being redacted.int
getRedactionMode()
Returns the redaction mode, as set bysetRedactionMode(int)
void
redact()
Performs the redaction.void
setRedactionColor(Paint color)
Sets the Paint to use to fill in any redacted areas.void
setRedactionMode(int mode)
Sets the redaction mode.
-
-
-
Field Detail
-
REDACT_TEXT
public static final int REDACT_TEXT
A flag that can be passed untosetRedactionMode(int)
to redact text content.- Since:
- 2.18.3
- See Also:
- Constant Field Values
-
REDACT_VECTOR
public static final int REDACT_VECTOR
A flag that can be passed untosetRedactionMode(int)
to redact vector drawing content.- Since:
- 2.18.3
- See Also:
- Constant Field Values
-
REDACT_BITMAP
public static final int REDACT_BITMAP
A flag that can be passed untosetRedactionMode(int)
to redact bitmap image content.- Since:
- 2.18.3
- See Also:
- Constant Field Values
-
REDACT_GRAPHICS
public static final int REDACT_GRAPHICS
A flag that can be passed untosetRedactionMode(int)
to redact all non-text content. This is equal toREDACT_VECTOR | REDACT_BITMAP
- Since:
- 2.18.3
- See Also:
- Constant Field Values
-
REDACT_ALL
public static final int REDACT_ALL
A flag that can be passed untosetRedactionMode(int)
to redact everything. Equal toREDACT_TEXT | REDACT_VECTOR | REDACT_BITMAP
. This is the default.- Since:
- 2.18.3
- See Also:
- Constant Field Values
-
-
Method Detail
-
getRedactionMode
public int getRedactionMode()
Returns the redaction mode, as set bysetRedactionMode(int)
- Since:
- 2.18.3
- See Also:
setRedactionMode(int)
-
setRedactionMode
public void setRedactionMode(int mode)
Sets the redaction mode. This is a bitwise-or of one or more of the variousREDACT_
flags. The default isREDACT_ALL
, which will redact all content in the specified area. A value which does not set any valid modes will throw anIllegalArgumentException
- Parameters:
mode
- a bitwise-or of one or more of the variousREDACT_
bitmasks- Since:
- 2.18.3
- See Also:
REDACT_ALL
-
addArea
public void addArea(PDFPage page, Area area)
Adds an area to redact out of the document.- Parameters:
page
- the page to redact the area fromarea
- the area to redact (in PDF page coordinates)
-
addStructuredElementByID
public void addStructuredElementByID(PDFPage page, String id)
Adds a marked content ID to redact out of the document.- Parameters:
page
- the page to redact fromid
- the ID of the marked content to redact
-
setRedactionColor
public void setRedactionColor(Paint color)
Sets the Paint to use to fill in any redacted areas. The paint can be a solid color, it can benew Color(0, true)
(a transparent color) to erase the content so the background can be seen, or it could be aPDFPattern
to redact with a "stamp".- Parameters:
color
- the color to redact with
-
redact
public void redact() throws IOException
Performs the redaction. Page content streams are normalised as part of this operation.- Throws:
IOException
-
contractAreaAlongBaseline
public static void contractAreaAlongBaseline(float[] corners, float amount)
When redacting individual words in the middle of a line of kerned text, an additional character on either side may also be redacted. This is due to kerning - the neighbouring characters partially overlap the requested area and so are removed as well. One solution is to "contract" the area for redaction slightly along the baseline of the text: the original characters will fall partially inside this reduced area so will still be redacted, but the neighbouring characters will fall outside. This method is a convenience method to adjust the corners returned fromPageExtractor.Text.getCorners()
by an amount in this way, to allow for this effect.- Parameters:
corners
- an array of n*8 coordinates of the form [x1,y1,x2,y2,x3,y3,x4,x4], specifying the clockwise outline of the text where the baseline of the text runs from (x1,y1) to (x4,y4). This array will be modified.amount
- what proportion of the height of the line to trim from each end of the line. Typically this value would be about 0.1 to 0.2.- Since:
- 2.13
-
findAnnotations
public List<PDFAnnotation> findAnnotations()
Return a List ofPDFAnnotation
objects that fall partially inside the area being redacted. Annotations are not redacted as part of the main process, but by deleting every annotation in this list any that are in that area will be removed as well - for example:List<PDFAnnotation> list = redactor.findAnnotations(); for (int i=0;i<list.size();i++) { list.get(i).setPage(null); }
- Since:
- 2.16
-
-