PDF Clown / Discussion / Open Discussion: Possible problem with PdfStream writing

Greetings to all,

During usage of PdfClown 0.1.1 .net version with our Digital Signature application, I've discovered a possible problem with PdfStream encoding during PDF file saving. I don't know if the same problem is also in 0.1.2 version and newest 0.2.0, however this may help fix the bug if present.

If you have a PDF that contains PdfStream object that are not endoded (for example with FlateDecode filter) and you save it on disk, the PdfStream objects will be saved encoded but in a wrong way, producing a PDF that contains errors that Acrobat Reader ignores on plain PDFs, but that with Digital Signatures inside, makes a lot of problems, invalidating the signatures.

Just doing this will reproduce the problem:

// Open and save, no changes...
byte[] pdfDocument = System.IO.File.ReadAllBytes(@"X:\TEMP\TEST_OLD.pdf");

SerializationModeEnum _serializationMode = true ? SerializationModeEnum.Standard : SerializationModeEnum.Incremental;

File _file = new File(new org.pdfclown.bytes.Stream(new MemoryStream(pdfDocument)));
Document _document = _file.Document;
_document.Configuration.CompatibilityMode = Document.ConfigurationImpl.CompatibilityModeEnum.Strict;

MemoryStream _o = new MemoryStream();
_file.Save(new org.pdfclown.bytes.Stream(_o), _serializationMode);
_o.Flush();
_o.Close();

byte[] _pdfFileB = _o.ToArray();

System.IO.File.WriteAllBytes(@"X:\TEMP\TEST_OLD_JS.pdf", _pdfFileB);

The PDFs you can use to replicate the problem, can be obtained simply saving a Word document in PDF (it will add a clear Metadata PdfStream) or with PdfCreator. However I've some sample files if it needs (they are attached).

TEST_OLD.pdf is the original file (a parsing show that the object 14 0 is a clear stream), the TEST_OLD_SAVED.pdf is the file saved (you can see the same object 14 0 is corrupt).

I've solved by me the problem by changing the following code in "\PDFClown\dotNET\pdfclown.lib\src\org\pdfclown\objects\PdfStream.cs" file at the following rows:

        // 1. Header.
        // Encoding.
        PdfDirectObject filterObject = header[PdfName.Filter];
        if (filterObject == null ) // Unencoded body.
        {
            /*
              NOTE: Header entries related to stream body encoding are temporary, instrumental to the
              current serialization process only.
            */
            unencodedBody = true;

            // Set the filter to apply!
            filterObject = PdfName.FlateDecode; // zlib/deflate filter.
            // Get encoded body data applying the filter to the stream!
            bodyData = body.Encode(Filter.Get((PdfName)filterObject), null);
            // Set encoded length!
            bodyLength = bodyData.Length;
            // Update 'Filter' entry!
            header[PdfName.Filter] = filterObject;
        }
        else // Encoded body.
        {
            unencodedBody = false;

            // Get encoded body data!
            bodyData = body.ToByteArray();
            // Set encoded length!
            bodyLength = (int)body.Length;
        }
        // Set encoded length!
        header[PdfName.Length] = new PdfInteger(bodyLength);

I've added something to not execute the first if() (like if (filterObject == null && false ) ). Doing this the PDFs are saved well, but I don't know what else I'm introducing in PDF saving ;-)

Last edit: Diego de Felice 2015-02-18

TEST_OLD.pdf

TEST_OLD.png

TEST_OLD_SAVED.pdf

TEST_OLD_SAVED.png

Hi Diego,

the metadata stream resulting from your processing (14 0 obj) is NOT corrupt: according to PDF 1.7 spec, keeping that kind of stream unfiltered is only a recommendation to allow non-PDF-aware tools to understand its information as plain text. Nonetheless that's a good point, so I have just committed on both 0.1.2-Fix branch (rev 139) and 0.2.0 trunk (rev 140) the support to unfiltered metadata streams (BTW, your code chunk about body length calculation is apparently useless (am I overlooking something?)).

I suggest you to check out either 0.1.2-Fix branch or 0.2.0 trunk to be up to date with the latest bug fixes.

thank you for your report!
Stefano

Possible problem with PdfStream writing

General-Purpose PDF Library for Java and .NET

Forums

Help

Possible problem with PdfStream writing

Possible problem with PdfStream writing

General-Purpose PDF Library for Java and .NET

Forums

Help

Possible problem with PdfStream writing document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Possible problem with PdfStream writing