How to Build an Image Extraction Webapp using Java

In this post we will take the code from our previous Image Extraction post to build a complete web service/web-app for uploading PDFs and also for downloading extracted images.

A few modifications to the Java program

If you compare the Java code from this blog post to the first post, you will spot a couple of subtle differences. Firstly the “extraction” code has been shaped into an object. The rendering part has been placed into an enum and the main method has been polished to offer some command line options like, specifying the source PDF, the output path and which bitmap format(s) should be used.

I also enforced the use of a Java Logger. This is always good to have. It will help to keep a clear console output, making it easy to parse from the code that will call the Java program. Another important point is that our program should use the System.exit method. It is used to signal how a program is completed. I am sure some of you remember the good old Windows 98 days and getting the error messages stating, “system error code 8206”. Any case other than 0 is an error. In our case, I defined two error codes:

  1. The program was not correctly called, there were some missing parameters
  2. The program could not read the given PDF

Using the program we built to create a web-app

I went for a JS stack for this web-app, with Node.js/Express in the back-end, Vue.js for the front and Material Design Lite for the UI. Let us keep the debate about which tech stack is the best for another time! No matter which techs you are using, the steps are all going to be the same:

  • User uploads a PDF to your server
  • Pass it along to the Java program
  • BFO does its work
  • Provides an option for the user to download the extracted images

Key Points with Express

Before pressing ahead we need to help Express upload the PDF files and for that we will use the well-known module multer.

   // require the module
   var multer = require("multer");
   // create a middleware-helper for the upload
   var upload = multer({ dest: "pdfs/" });

Once initialized, this module offers us a lot of possibilities which we will use to create a middleware for handling the upload:"/extract", upload.single("pdfFile"), function (req, res) {
       // now we can access the uploaded file like that
       let pdfFilePath = req.file.path;
       console.log(`the file was uploaded to ${pdfFilePath}`);

Another useful module used to manage the file-system and create/delete files and folders more easily than the default module of Node.js, is fs-extra. With that, we will be able to clear entire directories once the users are finished with the images.

However, the most important part of the work on the server side is to call the Java extraction program and to harvest the results. We will use the spawn method from the child_process module. This will enable us to run a bash command from node:

    let child = spawn(
         // put here the rest of the parameters (pdf file path, type of picture to create, output path to use)

Once the process has started, we can monitor it by placing listeners:

     child.stdout.on('data', (data) => {
       // the process called sent out some text to the standard output (a.k.a System.out.print())
       console.log(`ok data => '${data}'`);

     child.stderr.on('data', (data) => {
       // the process sent out some text on the standard error output (e.g. System.err.println())
       console.log(`err data => '${data}'`);

     child.on('close', (code) => {
       // the process finished, code will be system exit code
       if (code === 0) {
         // it worked!
       } else {
         // there was some error

You can check the complete code in api.js to see how the pieces are put together to create the complete service. We have also created a file repository for reference.

Leo Jeusset
Freelance developer and BFO guest blogger