PDF Javascript Stripper Code
Brought to you by:
lomby
| File | Date | Author | Commit |
|---|---|---|---|
| src | 2009-10-12 | lomby | [r6] Fixes with comments from mailing list |
| COPYING | 2009-10-07 | lomby | [r3] Added license |
| COPYING.LESSER | 2009-10-07 | lomby | [r3] Added license |
| README.txt | 2009-10-12 | lomby | [r7] Add thank you |
| build.xml | 2009-10-12 | lomby | [r8] Removes wrong comments. |
| input.pdf | 2009-10-08 | lomby | [r5] Version 1.0 |
Name: PdfJavascriptStripper
Version: 1.1
License: LGPL
Author: http://www.oneoverzero.net andrea.lombardoni@oneoverzero.net
Description: This Java utility removes the Javascript parts from a PDF
document. It may be useful to avoid injection/phishing attacks.
It is based on the iText library http://www.lowagie.com/iText/
Thanks to: Mark Storer
How to compile:
---------------
You have to obtain a copy of iText (supported version is 2.0.8).
You can obtain it at SourceForge.net:
https://sourceforge.net/projects/itext/files/iText/iText2.0.8/iText-2.0.8.jar/download
First of all, edit the build.xml file and fix the path containing the iText jar.
The line:
<pathelement location="${basedir}/iText-2.0.8.jar" />
Must be changed to point to your iText jar.
Run:
ant compile
This will build everything.
How to run:
-----------
Prepare the PDF file that must be processed and name it input.pdf
Run:
ant pdfstripper
The output will be in a file called output.pdf
How to use:
-----------
In class:
net.oneoverzero.common.itext.PdfJavascriptStripper
Use the method:
public static Pair<byte[],Boolean> stripJavascript(final byte[] in)
Which takes a PDF document as a byte array and returns the same PDF document
without all the Javascript. The boolean flag tells is some Javascript was
removed.
Roadmap:
--------
- check that we remove all Javascript
- remove also Flash
- remove also HTML/Anchors