Home

Nedim Srndic

libpdfjs

libpdfjs is a library that extracts JavaScript code from PDF files.

The library is written in C++ with a very simple API, and Python bindings are provided. It utilizes the Poppler library to parse PDF files and looks for the spots in a PDF file where JavaScript is expected. The returned JavaScript scripts are UTF-8-encoded.

The library has been tested on tens of thousands of different, both valid and malformed, PDF files. It is licensed under the GNU General Public License version 3 or newer.

A description of the algorithm used in libpdfjs is provided in the paper "Static Detection of Malicious JavaScript-Bearing PDF Documents" presented at ACSAC 2011 (website | pdf | bib).

You can find further information about the project in the README file. The changes are summarized in the CHANGELOG. You can view the source code here or check it out from the SVN repository using the following command:

svn checkout svn://svn.code.sf.net/p/libpdfjs/code/trunk libpdfjs-code

Alternatively, you can download it from here. The installation instructions are provided in the INSTALL file.

Enjoy,

Project Admins:
University of Tübingen