In this post we will take the code from our previous Image Extraction post to build a complete web service/web-app for uploading PDFs and also for downloading extracted images.
A few modifications to the Java program
If you compare the Java code from this blog post to the first post, you will spot a couple of subtle differences. Firstly the “extraction” code has been shaped into an object. The rendering part has been placed into an enum and the main method has been polished to offer some command line options like, specifying the source PDF, the output path and which bitmap format(s) should be used.
I also enforced the use of a Java Logger. This is always good to have. It
will help to keep a clear console output, making it easy to parse from the
code that will call the Java program. Another important point is that our
program should use the System.exit
method. It is used to
signal how a program is completed. I am sure some of you remember the good
old Windows 98 days and getting the error messages stating, “system error
code 8206”. Any case other than 0 is an error. In our case, I defined two
error codes:
- The program was not correctly called, there were some missing parameters
- The program could not read the given PDF
Using the program we built to create a web-app
I went for a JS stack for this web-app, with Node.js/Express in the back-end, Vue.js for the front and Material Design Lite for the UI. Let us keep the debate about which tech stack is the best for another time! No matter which techs you are using, the steps are all going to be the same:
- User uploads a PDF to your server
- Pass it along to the Java program
- BFO does its work
- Provides an option for the user to download the extracted images
Key Points with Express
Before pressing ahead we need to help Express upload the PDF files and for
that we will use the well-known module multer
.
// require the module var multer = require("multer"); // create a middleware-helper for the upload var upload = multer({ dest: "pdfs/" });
Once initialized, this module offers us a lot of possibilities which we will use to create a middleware for handling the upload:
router.post("/extract", upload.single("pdfFile"), function (req, res) { // now we can access the uploaded file like that let pdfFilePath = req.file.path; console.log(`the file was uploaded to ${pdfFilePath}`); });
Another useful module used to manage the file-system and create/delete files
and folders more easily than the default module of Node.js, is
fs-extra
.
With that, we will be able to clear entire directories once the users are
finished with the images.
However, the most important part of the work on the server side is to call
the Java extraction program and to harvest the results. We will use the
spawn
method from the child_process
module. This
will enable us to run a bash command from node:
let child = spawn( 'java', [ '-jar', './jars/ImageExtractor.jar', // put here the rest of the parameters (pdf file path, type of picture to create, output path to use) ] )
Once the process has started, we can monitor it by placing listeners:
child.stdout.on('data', (data) => { // the process called sent out some text to the standard output (a.k.a System.out.print()) console.log(`ok data => '${data}'`); }); child.stderr.on('data', (data) => { // the process sent out some text on the standard error output (e.g. System.err.println()) console.log(`err data => '${data}'`); }); child.on('close', (code) => { // the process finished, code will be system exit code if (code === 0) { // it worked! } else { // there was some error } });
You can check the
complete code in api.js
to see how the pieces are put together to create the complete service. We have also
created a
file repository for reference.
Freelance developer and BFO guest blogger
https://twitter.com/leojpod