public class Redactor extends Object
Area
objects on each PDFPage
, then call redact()
. Here's an
example showing how to remove any copies of the word "secret"
from a PDF.
PDF pdf = new PDF(new PDFReader(new File("private.pdf"))); PDFParser parser = new PDFParser(pdf); Redactor redactor = new Redactor(); for (int i=0;i<pdf.getNumberOfPages();i++) { PageExtractor extractor = parser.getPageExtractor(i); Collection all = extractor.getMatchingText("secret"); for (Iterator j = all.iterator();j.hasNext();) { PageExtractor.Text text = (PageExtractor.Text)j.next(); float[] corners = text.getCorners(); GeneralPath path = new GeneralPath(); path.moveTo(corners[0], corners[1]); path.lineTo(corners[2], corners[3]); path.lineTo(corners[4], corners[5]); path.lineTo(corners[6], corners[7]); Area area = new Area(path); redactor.addArea(text.getPage(), area); } } redactor.redact(); pdf.render(new FileOutputStream("public.pdf"));
Redaction potentially has to rebuild the PDF structure, so will lock on the PDF object itself before running and on each page before processing it.
A digitally signed PDF cannot be redacted - the signature will cause the old version
of the content to remain in the file for easy recovery. Consequently the redact()
method will throw an IllegalStateException
if it's tried. The signatures
can be deleted manually, or you can do the following to remove them:
OutputProfile p = new OutputProfile(OutputProfile.Default); p.setDenied(OutputProfile.Feature.DigitallySigned); pdf.setOutputProfile(p);
Annotations will not be removed as part of the redaction process either - annotations are
not really part of the page but live "above" it. While they cannot be partially redacted,
the findAnnotations()
method can be used to retrieve a list of all annotations
partially covered by the redaction area and delete them - for example:
List<PDFAnnotation> list = redactor.findAnnotations(); for (int i=0;i<list.size();i++) { list.get(i).setPage(null); }
Finally, note that redaction only deals with the page content - there may be sensitive information in annotations (which can have a Text Value), form fields (which may have multiple annotations to represent them in the page, or even no annotations at all), metadata and so on.
Modifier and Type | Field and Description |
---|---|
static int |
REDACT_ALL
A flag that can be passed unto
setRedactionMode(int) to redact everything. |
static int |
REDACT_BITMAP
A flag that can be passed unto
setRedactionMode(int) to redact bitmap image content. |
static int |
REDACT_GRAPHICS
A flag that can be passed unto
setRedactionMode(int) to redact all non-text content. |
static int |
REDACT_TEXT
A flag that can be passed unto
setRedactionMode(int) to redact text content. |
static int |
REDACT_VECTOR
A flag that can be passed unto
setRedactionMode(int) to redact vector drawing content. |
Constructor and Description |
---|
Redactor()
Creates a new Redactor
|
Modifier and Type | Method and Description |
---|---|
void |
addArea(PDFPage page,
Area area)
Adds an area to redact out of the document.
|
void |
addStructuredElementByID(PDFPage page,
String id)
Adds a marked content ID to redact out of the document.
|
static void |
contractAreaAlongBaseline(float[] corners,
float amount)
When redacting individual words in the middle of a line of kerned text,
an additional character on either side may also be redacted.
|
List<PDFAnnotation> |
findAnnotations()
Return a List of
PDFAnnotation objects that fall partially inside the area
being redacted. |
int |
getRedactionMode()
Returns the redaction mode, as set by
setRedactionMode(int) |
void |
redact()
Performs the redaction.
|
void |
setRedactionColor(Paint color)
Sets the Paint to use to fill in any redacted areas.
|
void |
setRedactionMode(int mode)
Sets the redaction mode.
|
public static final int REDACT_TEXT
setRedactionMode(int)
to redact text content.public static final int REDACT_VECTOR
setRedactionMode(int)
to redact vector drawing content.public static final int REDACT_BITMAP
setRedactionMode(int)
to redact bitmap image content.public static final int REDACT_GRAPHICS
setRedactionMode(int)
to redact all non-text content.
This is equal to REDACT_VECTOR | REDACT_BITMAP
public static final int REDACT_ALL
setRedactionMode(int)
to redact everything.
Equal to REDACT_TEXT | REDACT_VECTOR | REDACT_BITMAP
.
This is the default.public int getRedactionMode()
setRedactionMode(int)
setRedactionMode(int)
public void setRedactionMode(int mode)
REDACT_
flags. The default is REDACT_ALL
,
which will redact all content in the specified area.
A value which does not set any valid modes will throw an IllegalArgumentException
mode
- a bitwise-or of one or more of the various REDACT_
bitmasksREDACT_ALL
public void addArea(PDFPage page, Area area)
page
- the page to redact the area fromarea
- the area to redact (in PDF page coordinates)public void addStructuredElementByID(PDFPage page, String id)
page
- the page to redact fromid
- the ID of the marked content to redactpublic void setRedactionColor(Paint color)
new Color(0, true)
(a transparent color) to erase the content so the background can be seen,
or it could be a PDFPattern
to redact with a "stamp".color
- the color to redact withpublic void redact() throws IOException
IOException
public static void contractAreaAlongBaseline(float[] corners, float amount)
PageExtractor.Text.getCorners()
by an amount in this way, to allow
for this effect.corners
- an array of n*8 coordinates of the form [x1,y1,x2,y2,x3,y3,x4,x4],
specifying the clockwise outline of the text where the baseline of the text runs
from (x1,y1) to (x4,y4). This array will be modified.amount
- what proportion of the height of the line to trim from each end
of the line. Typically this value would be about 0.1 to 0.2.public List<PDFAnnotation> findAnnotations()
PDFAnnotation
objects that fall partially inside the area
being redacted. Annotations are not redacted as part of the main process, but by
deleting every annotation in this list any that are in that area will be removed as
well - for example:
List<PDFAnnotation> list = redactor.findAnnotations(); for (int i=0;i<list.size();i++) { list.get(i).setPage(null); }
Copyright © 2001-2017 Big Faceless Organization