During usage of PdfClown 0.1.1 .net version with our Digital Signature application, I've discovered a possible problem with PdfStream encoding during PDF file saving. I don't know if the same problem is also in 0.1.2 version and newest 0.2.0, however this may help fix the bug if present.
If you have a PDF that contains PdfStream object that are not endoded (for example with FlateDecode filter) and you save it on disk, the PdfStream objects will be saved encoded but in a wrong way, producing a PDF that contains errors that Acrobat Reader ignores on plain PDFs, but that with Digital Signatures inside, makes a lot of problems, invalidating the signatures.
Just doing this will reproduce the problem:
// Open and save, no changes...byte[]pdfDocument=System.IO.File.ReadAllBytes(@"X:\TEMP\TEST_OLD.pdf");SerializationModeEnum_serializationMode=true?SerializationModeEnum.Standard:SerializationModeEnum.Incremental;File_file=newFile(neworg.pdfclown.bytes.Stream(newMemoryStream(pdfDocument)));Document_document=_file.Document;_document.Configuration.CompatibilityMode=Document.ConfigurationImpl.CompatibilityModeEnum.Strict;MemoryStream_o=newMemoryStream();_file.Save(neworg.pdfclown.bytes.Stream(_o),_serializationMode);_o.Flush();_o.Close();byte[]_pdfFileB=_o.ToArray();System.IO.File.WriteAllBytes(@"X:\TEMP\TEST_OLD_JS.pdf",_pdfFileB);
The PDFs you can use to replicate the problem, can be obtained simply saving a Word document in PDF (it will add a clear Metadata PdfStream) or with PdfCreator. However I've some sample files if it needs (they are attached).
TEST_OLD.pdf is the original file (a parsing show that the object 14 0 is a clear stream), the TEST_OLD_SAVED.pdf is the file saved (you can see the same object 14 0 is corrupt).
I've solved by me the problem by changing the following code in "\PDFClown\dotNET\pdfclown.lib\src\org\pdfclown\objects\PdfStream.cs" file at the following rows:
// 1. Header.
// Encoding.
PdfDirectObject filterObject = header[PdfName.Filter];
if (filterObject == null ) // Unencoded body.
{
/*
NOTE: Header entries related to stream body encoding are temporary, instrumental to the
current serialization process only.
*/
unencodedBody = true;
// Set the filter to apply!
filterObject = PdfName.FlateDecode; // zlib/deflate filter.
// Get encoded body data applying the filter to the stream!
bodyData = body.Encode(Filter.Get((PdfName)filterObject), null);
// Set encoded length!
bodyLength = bodyData.Length;
// Update 'Filter' entry!
header[PdfName.Filter] = filterObject;
}
else // Encoded body.
{
unencodedBody = false;
// Get encoded body data!
bodyData = body.ToByteArray();
// Set encoded length!
bodyLength = (int)body.Length;
}
// Set encoded length!
header[PdfName.Length] = new PdfInteger(bodyLength);
I've added something to not execute the first if() (like if (filterObject == null && false ) ). Doing this the PDFs are saved well, but I don't know what else I'm introducing in PDF saving ;-)
the metadata stream resulting from your processing (14 0 obj) is NOT corrupt: according to PDF 1.7 spec, keeping that kind of stream unfiltered is only a recommendation to allow non-PDF-aware tools to understand its information as plain text. Nonetheless that's a good point, so I have just committed on both 0.1.2-Fix branch (rev 139) and 0.2.0 trunk (rev 140) the support to unfiltered metadata streams (BTW, your code chunk about body length calculation is apparently useless (am I overlooking something?)).
I suggest you to check out either 0.1.2-Fix branch or 0.2.0 trunk to be up to date with the latest bug fixes.
thank you for your report!
Stefano
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Greetings to all,
During usage of PdfClown 0.1.1 .net version with our Digital Signature application, I've discovered a possible problem with PdfStream encoding during PDF file saving. I don't know if the same problem is also in 0.1.2 version and newest 0.2.0, however this may help fix the bug if present.
If you have a PDF that contains PdfStream object that are not endoded (for example with FlateDecode filter) and you save it on disk, the PdfStream objects will be saved encoded but in a wrong way, producing a PDF that contains errors that Acrobat Reader ignores on plain PDFs, but that with Digital Signatures inside, makes a lot of problems, invalidating the signatures.
Just doing this will reproduce the problem:
The PDFs you can use to replicate the problem, can be obtained simply saving a Word document in PDF (it will add a clear Metadata PdfStream) or with PdfCreator. However I've some sample files if it needs (they are attached).
TEST_OLD.pdf is the original file (a parsing show that the object 14 0 is a clear stream), the TEST_OLD_SAVED.pdf is the file saved (you can see the same object 14 0 is corrupt).
I've solved by me the problem by changing the following code in "\PDFClown\dotNET\pdfclown.lib\src\org\pdfclown\objects\PdfStream.cs" file at the following rows:
I've added something to not execute the first if() (like if (filterObject == null && false ) ). Doing this the PDFs are saved well, but I don't know what else I'm introducing in PDF saving ;-)
Last edit: Diego de Felice 2015-02-18
Hi Diego,
the metadata stream resulting from your processing (14 0 obj) is NOT corrupt: according to PDF 1.7 spec, keeping that kind of stream unfiltered is only a recommendation to allow non-PDF-aware tools to understand its information as plain text. Nonetheless that's a good point, so I have just committed on both 0.1.2-Fix branch (rev 139) and 0.2.0 trunk (rev 140) the support to unfiltered metadata streams (BTW, your code chunk about body length calculation is apparently useless (am I overlooking something?)).
I suggest you to check out either 0.1.2-Fix branch or 0.2.0 trunk to be up to date with the latest bug fixes.
thank you for your report!
Stefano
Hello Stefano, thank you very much for the help. I'll wait for the 0.2.0 version (I'm waiting mainly for the page rendering function :-D ).