PDF Javascript Stripper Code

Brought to you by: lomby

Tree [r8] /

History

HTTPS access

File	Date	Author	Commit
src	2009-10-12	lomby	[r6] Fixes with comments from mailing list
COPYING	2009-10-07	lomby	[r3] Added license
COPYING.LESSER	2009-10-07	lomby	[r3] Added license
README.txt	2009-10-12	lomby	[r7] Add thank you
build.xml	2009-10-12	lomby	[r8] Removes wrong comments.
input.pdf	2009-10-08	lomby	[r5] Version 1.0

Read Me

Name: PdfJavascriptStripper
Version: 1.1
License: LGPL
Author: http://www.oneoverzero.net andrea.lombardoni@oneoverzero.net
Description: This Java utility removes the Javascript parts from a PDF
             document. It may be useful to avoid injection/phishing attacks.
             It is based on the iText library http://www.lowagie.com/iText/
Thanks to: Mark Storer 

How to compile:
---------------

You have to obtain a copy of iText (supported version is 2.0.8).
You can obtain it at SourceForge.net:

https://sourceforge.net/projects/itext/files/iText/iText2.0.8/iText-2.0.8.jar/download

First of all, edit the build.xml file and fix the path containing the iText jar.

The line:
                <pathelement location="${basedir}/iText-2.0.8.jar" />

Must be changed to point to your iText jar.

Run: 

ant compile

This will build everything.

How to run:
-----------

Prepare the PDF file that must be processed and name it input.pdf

Run:

ant pdfstripper

The output will be in a file called output.pdf

How to use:
-----------

In class:
  
  net.oneoverzero.common.itext.PdfJavascriptStripper

Use the method:
  
  public static Pair<byte[],Boolean> stripJavascript(final byte[] in)

Which takes a PDF document as a byte array and returns the same PDF document
without all the Javascript. The boolean flag tells is some Javascript was
removed.

Roadmap:
--------

- check that we remove all Javascript
- remove also Flash
- remove also HTML/Anchors

PDF Javascript Stripper Code

Tree [r8] / Download Snapshot History

Read Me

Tree [r8] /

History