Text Highlighting with the PDF Viewer

The PDF Library ships with a PDF Viewer that offers the possibility to programmatically highlight words or sentences that match a filter.

You can highlight pieces of information on an embedded PDF online, or hide something from the Viewer (you should however not forget to prevent the text selection and the copying and pasting of text from your document). We are going to quickly browse through those cases with an example: this products catalog file.

Working with the viewer

First things first, let's see how to create a basic text highlighter and how to apply it to the PDF viewer. To load a document and display it in the Viewer, use the following snippet:

	     final List<ViewerFeature> features = new ArrayList<>(ViewerFeature.getAllEnabledFeatures()); // 1
	     // Here will go our TextHighlighter related code
	     SwingUtilities.invokeLater(new Runnable() {
	         public void run() {
	             PDFViewer viewer = PDFViewer.newPDFViewer(features); // 2
	             viewer.loadPDF(new File(pathToPDF)); // 3
	         }
	     });
	 

In this short piece of code, we first fetch all the Viewer features (1`) that are typically enabled (highlighting, annotation, etc) and create a new Viewer with them (2). We then feed our PDF document to the Viewer via .loadPDF (3).

Adding a first text highlighter

There are two ways of using the TextHighlighter: one can either use a list of words that have to be highlighted or work with regular expression and use a pattern to decide. In any case the basis is the same:

	     TextHighlighter wordHighlighter = new TextHighlighter();
	     //configure the text highlighter here
	     ...
	     ...
	     features.add(wordHighlighter);
	 

You can repeat this pattern to have several highlighters at once. Now, let us take a look at how can we configure a TextHighlighter. The most basic is via the method .addWord(String word).

	     wordHighlighter.addWord("US");
	     wordHighlighter.addWord("AU");
	 

With this code, all the occurrences of US or AU in our list will be highlighted by a yellow rectangle. However, as you can see by running the example, words that contains US or AU (e.g. AU01) are also highlighted.

We can use a regular expression instead and replace the addWord("AU") by .setPattern(Pattern pattern).

	     TextHighlighter wordHighlighter = new TextHighlighter();
	     String patternStr = "\\bAU\\b";
	     Pattern pattern = Pattern.compile(patternStr);
	     wordHighlighter.setPattern(pattern);
	 

But keep in mind that setPattern cannot be used with addWord, if you want to catch several words with setPattern you need to handle it in your regular expression!

Highlighting customization

PDF Library also allows you to customize the text highlighting used by TextHighlighter. For instance:

	     TextHighlighter frSeHighlighter = new TextHighlighter();
	     frSeHighlighter.addWord("FR");
	     frSeHighlighter.addWord("SE");
	     frSeHighlighter.setHighlightType(TextTool.TYPE_BLOCK, new Color(0xaa0000ff, true), new BasicStroke(), 1);
	     highlighters.add(frSeHighlighter);
    
	     TextHighlighter auHighlighter = new TextHighlighter();
	     auHighlighter.addWord("AU");
	     auHighlighter.setHighlightType(TextTool.TYPE_OUTLINE, new Color(0xff0000),new BasicStroke(), 0.2f);
	     highlighters.add(auHighlighter);
	 

With this setup, FR and SE occurrences will be highlighted by a transparent blue rectangle while AU will be highlighted by a rectangular red box.

To go further, we could use this mechanism to preseve the anonimity of our producers (note we're only preventing the text from appearing in the Viewer, for permanent removal it would be better to redact it). We simply need to remove the TextTool feature from the original list (to make sure no one can select and copy text from our document):

	     final ArrayList<ViewerFeature> features = new ArrayList<>(ViewerFeature.getAllEnabledFeatures());
	     Iterator<ViewerFeature> iter = features.iterator(); 
	     while (iter.hasNext()) {
	         ViewerFeature viewerFeature = iter.next();
	         if ( viewerFeature.getName().equals("TextTool") ) { 
	             iter.remove();
	             break;
	         }
	     }
	 

Then we can highlight names with a black block:

	     TextHighlighter wordHighlighter = new TextHighlighter();
	     wordHighlighter.addWord("Pierre Dubois");
	     wordHighlighter.addWord("John Watson");
	     wordHighlighter.setHighlightType(TextTool.TYPE_BLOCK, Color.BLACK, new BasicStroke(), 1);
	 

And we are done!