PDF Javascript Stripper Code
Brought to you by:
lomby
File | Date | Author | Commit |
---|---|---|---|
src | 2009-10-12 | lomby | [r6] Fixes with comments from mailing list |
COPYING | 2009-10-07 | lomby | [r3] Added license |
COPYING.LESSER | 2009-10-07 | lomby | [r3] Added license |
README.txt | 2009-10-12 | lomby | [r7] Add thank you |
build.xml | 2009-10-12 | lomby | [r8] Removes wrong comments. |
input.pdf | 2009-10-08 | lomby | [r5] Version 1.0 |
Name: PdfJavascriptStripper Version: 1.1 License: LGPL Author: http://www.oneoverzero.net andrea.lombardoni@oneoverzero.net Description: This Java utility removes the Javascript parts from a PDF document. It may be useful to avoid injection/phishing attacks. It is based on the iText library http://www.lowagie.com/iText/ Thanks to: Mark Storer How to compile: --------------- You have to obtain a copy of iText (supported version is 2.0.8). You can obtain it at SourceForge.net: https://sourceforge.net/projects/itext/files/iText/iText2.0.8/iText-2.0.8.jar/download First of all, edit the build.xml file and fix the path containing the iText jar. The line: <pathelement location="${basedir}/iText-2.0.8.jar" /> Must be changed to point to your iText jar. Run: ant compile This will build everything. How to run: ----------- Prepare the PDF file that must be processed and name it input.pdf Run: ant pdfstripper The output will be in a file called output.pdf How to use: ----------- In class: net.oneoverzero.common.itext.PdfJavascriptStripper Use the method: public static Pair<byte[],Boolean> stripJavascript(final byte[] in) Which takes a PDF document as a byte array and returns the same PDF document without all the Javascript. The boolean flag tells is some Javascript was removed. Roadmap: -------- - check that we remove all Javascript - remove also Flash - remove also HTML/Anchors