Menu

Possible problem with PdfStream writing

2015-02-18
2015-02-25
  • Diego de Felice

    Diego de Felice - 2015-02-18

    Greetings to all,

    During usage of PdfClown 0.1.1 .net version with our Digital Signature application, I've discovered a possible problem with PdfStream encoding during PDF file saving. I don't know if the same problem is also in 0.1.2 version and newest 0.2.0, however this may help fix the bug if present.

    If you have a PDF that contains PdfStream object that are not endoded (for example with FlateDecode filter) and you save it on disk, the PdfStream objects will be saved encoded but in a wrong way, producing a PDF that contains errors that Acrobat Reader ignores on plain PDFs, but that with Digital Signatures inside, makes a lot of problems, invalidating the signatures.

    Just doing this will reproduce the problem:

    // Open and save, no changes...
    byte[] pdfDocument = System.IO.File.ReadAllBytes(@"X:\TEMP\TEST_OLD.pdf");
    
    SerializationModeEnum _serializationMode = true ? SerializationModeEnum.Standard : SerializationModeEnum.Incremental;
    
    File _file = new File(new org.pdfclown.bytes.Stream(new MemoryStream(pdfDocument)));
    Document _document = _file.Document;
    _document.Configuration.CompatibilityMode = Document.ConfigurationImpl.CompatibilityModeEnum.Strict;
    
    MemoryStream _o = new MemoryStream();
    _file.Save(new org.pdfclown.bytes.Stream(_o), _serializationMode);
    _o.Flush();
    _o.Close();
    
    byte[] _pdfFileB = _o.ToArray();
    
    System.IO.File.WriteAllBytes(@"X:\TEMP\TEST_OLD_JS.pdf", _pdfFileB);
    

    The PDFs you can use to replicate the problem, can be obtained simply saving a Word document in PDF (it will add a clear Metadata PdfStream) or with PdfCreator. However I've some sample files if it needs (they are attached).

    TEST_OLD.pdf is the original file (a parsing show that the object 14 0 is a clear stream), the TEST_OLD_SAVED.pdf is the file saved (you can see the same object 14 0 is corrupt).

    I've solved by me the problem by changing the following code in "\PDFClown\dotNET\pdfclown.lib\src\org\pdfclown\objects\PdfStream.cs" file at the following rows:

            // 1. Header.
            // Encoding.
            PdfDirectObject filterObject = header[PdfName.Filter];
            if (filterObject == null ) // Unencoded body.
            {
                /*
                  NOTE: Header entries related to stream body encoding are temporary, instrumental to the
                  current serialization process only.
                */
                unencodedBody = true;
    
                // Set the filter to apply!
                filterObject = PdfName.FlateDecode; // zlib/deflate filter.
                // Get encoded body data applying the filter to the stream!
                bodyData = body.Encode(Filter.Get((PdfName)filterObject), null);
                // Set encoded length!
                bodyLength = bodyData.Length;
                // Update 'Filter' entry!
                header[PdfName.Filter] = filterObject;
            }
            else // Encoded body.
            {
                unencodedBody = false;
    
                // Get encoded body data!
                bodyData = body.ToByteArray();
                // Set encoded length!
                bodyLength = (int)body.Length;
            }
            // Set encoded length!
            header[PdfName.Length] = new PdfInteger(bodyLength);
    

    I've added something to not execute the first if() (like if (filterObject == null && false ) ). Doing this the PDFs are saved well, but I don't know what else I'm introducing in PDF saving ;-)

     

    Last edit: Diego de Felice 2015-02-18
  • Stefano Chizzolini

    Hi Diego,

    the metadata stream resulting from your processing (14 0 obj) is NOT corrupt: according to PDF 1.7 spec, keeping that kind of stream unfiltered is only a recommendation to allow non-PDF-aware tools to understand its information as plain text. Nonetheless that's a good point, so I have just committed on both 0.1.2-Fix branch (rev 139) and 0.2.0 trunk (rev 140) the support to unfiltered metadata streams (BTW, your code chunk about body length calculation is apparently useless (am I overlooking something?)).

    I suggest you to check out either 0.1.2-Fix branch or 0.2.0 trunk to be up to date with the latest bug fixes.

    thank you for your report!
    Stefano

     
  • Diego de Felice

    Diego de Felice - 2015-02-25

    Hello Stefano, thank you very much for the help. I'll wait for the 0.2.0 version (I'm waiting mainly for the page rendering function :-D ).

     

Log in to post a comment.