Merging and Concatenating PDFs

The BFO PDF Library provides some very easy ways to access pages and other properties of a PDF document by using standard Java Collections. It allows you to leverage your general Java skills to manipulate PDF documents and concatenate/merge them as you wish! In this post, we will see what needs to be done to do just that.

The basic idea: work with collections

In the Library, you can access a PDFs pages as a List<PDFPage> which means that a basic concatenation can be done by using Java to concatenate 2 lists. It is as easy as that:

public static PDF simplestConcat(String... strings) throws IOException {
       PDF out = new PDF();
       //take every given filename
       for (String filename: strings) {
           //first fetch the file to add. 
           FileInputStream in = new FileInputStream(filename);
           // feed the file to a PDF reader
           PDFReader reader = new PDFReader(in);
           //do not forget to close readers once you don't need them
           // Save some memory!
           in.close(); 
           // create a PDF from it.
           PDF pdf = new PDF(reader);
           //get all the pages on this PDF
           List<PDFPage> pages = pdf.getPages();
           //get a link to the pages of the output document
           List<PDFPage> outputPages = out.getPages();
           //add the new ones
           outputPages.addAll(pages);
           //and that's it!
       }
       return out;
   }
   

For your annotated PDF, do not worry annotation are carried over to the merged PDF free of charge!

What about forms?

Forms are stored in another Java collection: a Map<String, FormElement>. We can do a concatenation of files with forms by adding the following line to our previous code (right after outputPages.addAll(pages);):

//In order to preserve the form when concatenating:
   //get the PDF's Form object
   Form form = pdf.getForm();
   Form outputForm = out.getForm();
   //get the elements of the output and the file to concatenate
   Map<String, FormElement> formElements = form.getElements();
   Map<String, FormElement> outputFormElements = outputForm.getElements();

   //add the new elements in the old one
   outputFormElements.putAll(formElements);
   

However, what if we have two FormElement with the same name (i.e. two forms with a "Firstname" element)? This small code would result in this error:

Exception in thread "main" java.lang.IllegalArgumentException: Form already contains element 
     "First_Last", cannot insert key "First_Last"
   

One quick and simple way to get around it is to prefix every FromElement key (using for instance the orignal filename) like this:

//In order to preserve the form when concatenating:
   //get the PDF's Form object
   Form form = pdf.getForm();
   Form outputForm = out.getForm();
   //get the elements of the output and the file to concatenate
   Map<String, FormElement> formElements = form.getElements();
   Map<String, FormElement> newFormElements = new HashMap<String, FormElement>();
   for (Entry<String, FormElement> entry: formElements.entrySet()) {
       newFormElements.put(filename+"-"+entry.getKey(), entry.getValue());
   }
   Map<String, FormElement> outputFormElements = outputForm.getElements();

   //add the new elements in the old one
   outputFormElements.putAll(newFormElements);
   

What about Bookmarks?

With the basic version, bookmarks are not preserved. However, there is a simple fix. for We can access (and copy) the bookmarks as a List<Bookmark> via getBookmarks:

List<PDFBookmark> bookmarks = pdf.getBookmarks();
   List<PDFBookmark> outputBookmarks = out.getBookmarks();
   outputBookmarks.addAll(bookmarks);
   

Unlike with the forms, you don't need to check their uniqueness.

How about merging instead of concatenating?

What if you need to insert PDF pages within a document instead of adding them at the end? No problem, you just need to handle the resulting page list differently. For instance, in this example method one page is inserted for three pages of the original as long as there are pages to insert:

public static PDF insertEveryThirdPage(String mainDocument, String... strings) throws FileNotFoundException, IOException {
       //get an iterator over the parameters
       Iterator<String> filenameIterator = Arrays.asList(strings).iterator();
       PDF out = new PDF();
       List<PDFPage> outputPages = out.getPages();

       FileInputStream in = new FileInputStream(mainDocument);
       // feed the file to a PDF reader
       PDFReader reader = new PDFReader(in);
       PDF main = new PDF(reader);
       in.close();
       List<PDFPage> mainPages = main.getPages();
       System.out.println(" -> " + mainPages.size());
       //copy all the pages to the new pdf: 
       outputPages.addAll(mainPages);

       System.out.println("we now have " + outputPages.size() + " pages ");

       //now get a list iterator: it allows to insert elements in the list. 
       ListIterator<PDFPage> outputIterator = outputPages.listIterator();

       //get the next 
       if (!filenameIterator.hasNext()) {
           throw new RuntimeException("We need at least 1 file with pages to insert!");
       }
       in = new FileInputStream(filenameIterator.next());
       reader = new PDFReader(in);
       in.close();
       PDF addition = new PDF(reader); 
       Iterator<PDFPage> insertIterator = addition.getPages().iterator();      

       int pagesCount = 0, insertions = 0;
       boolean moreMerging = true;
       while (outputIterator.hasNext() && moreMerging) {
           outputIterator.next();
           pagesCount++;
           System.out.println("pagesCount -> " + pagesCount);
           //check if it is time to insert a page
           if (pagesCount % 3 == 0) {
               //let's do it:
               //first make sure that we have a page to insert
               while (!insertIterator.hasNext() && moreMerging) {
                   if (filenameIterator.hasNext()) {
                       String filename = filenameIterator.next();
                       in = new FileInputStream(filename);
                       reader = new PDFReader(in);
                       addition = new PDF(reader);
                       insertIterator = addition.getPages().iterator();
                   } else {
                       moreMerging = false;
                   }
               }
               if (insertIterator.hasNext() && moreMerging) {
                   PDFPage toInsertPage = insertIterator.next();
                   //the add method place the new element "before" the current pointer
                   outputIterator.add(toInsertPage);
                   insertions ++;
                   System.out.println("insertion #" + insertions);
               }
           }
       }
       return out;    
   }
   
Leo Jeusset
Freelance developer and BFO guest blogger
https://twitter.com/leojpod