pdf-extractor
Node.js module for rendering pdf pages to images, svgs and HTML files
...A DOM Canvas is used to render and export the graphical layer of the pdf. Canvas exports *.png as a default but can be extended to export to other file types like .jpg. Pdf objects are converted to svg using the SVGGraphics parser of pdf.js. Pdf text is converted to HTML. This can be used as a (transparent) layer over the image to enable text selection. Pdf text is extracted to a text file for different usages (e.g. indexing the text). This library is in it's most basic form a node.js wrapper for pdf.js. It has default renderers to generate a default output, but is easily extended to incorporate custom logic or to generate different output. ...