From: Michal H. <ms...@gm...> - 2009-10-09 14:21:15
Attachments:
hybrid_file-multiversion-fix.patch
|
Hi, attached you can find a fix for bt#338. I wouldn't post it normally, because it is changes code solely in xrefwriter.cc, but I am not sure about its correctness wrt. PDF specification. I would be grateful to have another pair of eyes on that. The affected part of the specification is: Chapter 3, "Compatibility with PDF 1.4" and hybrid document files and our multiversion documents handling. Let me know if you have any questions. Thanks -- Michal Hocko |
From: Michal H. <ms...@gm...> - 2009-10-22 13:51:20
|
On Fri, Oct 09, 2009 at 04:20:41PM +0200, Michal Hocko wrote: > Hi, > attached you can find a fix for bt#338. I wouldn't post it normally, > because it is changes code solely in xrefwriter.cc, but I am not sure > about its correctness wrt. PDF specification. I would be grateful to have > another pair of eyes on that. > > The affected part of the specification is: Chapter 3, "Compatibility with PDF 1.4" > and hybrid document files and our multiversion documents handling. Anybody willing to go through the specification and check whether this patch conforms? If there is nobody to do so I think it doesn't make any sence to wait longer and I will commit the patch. We can revert it later if it proves being wrong. > > Let me know if you have any questions. > > Thanks > -- > Michal Hocko > kernel: handle hybrid-xref files correctly in XRefWriter > > Fixes bt#338 > > PDF 1.5 and above versions of specification support so called hybrid files > which contain both old-style xref table and xref stream. Such a document > then looks as follows: > > [@OFFSET2] > xref > [Standard xref entries. Hidden objects are marked as free with max gen > number] > trailer > << > /Root [...] > >> > > [@OFFEST3] > 1 0 obj > << > /Length [...] > >> > stream > [object stream data] > endstream > > [@OFFSET1] > xref > 0 0 > trailer > << > /Prev OFFSET2 > /XRefStm OFFSET3 > >> > > startxref > OFFSET1 > %EOF > > This means that this kind of file looks like a 2 revisions document. This > wouldn't be a big problem but we have a different problem if we consider > this to be 2 revisions document. XRefWriter::getRevision size calculates > revision size which is done as revStart-endPrev (where endPrev is the end > of the previous revision), but we are not able to calculate end of the > previous revision because xref (at OFFSET2) is not followed by startxref > keyword as expected. Application then ends with assertion failure > revStar>endPrev. > > Of course we could fix the XRefWriter::getRevisionEnd but the point is that > we shouldn't consider this layout multi-version in the first place. The > "proper" fix is not considering additional xref for hybrid files at all. > This is done in XRefWriter::collectRevisions by checking XRefStm entry in > the trailer. > > If a hybrid document is multiversion then the referenced trailer (OFFSET2) > should contain /Prev entry IMO. The PDF specification is not clear in this > regard. > > Index: pdfedit-patches/src/kernel/xrefwriter.cc > =================================================================== > --- pdfedit-patches.orig/src/kernel/xrefwriter.cc 2009-10-09 15:12:37.000000000 +0200 > +++ pdfedit-patches/src/kernel/xrefwriter.cc 2009-10-09 15:37:53.000000000 +0200 > @@ -728,23 +728,50 @@ void XRefWriter::collectRevisions() > return; > } > > + bool hybrid_xref = false; > do > { > - // process current trailer (store offset and get previous > - // if present) - we are starting with the newest one cloned > - // above > - kernelPrintDbg(DBG_DBG, "XRef offset for "<< > - revisions.size()<<" revision is "<<off); > - revisions.insert(revisions.begin(), off); > + // Take care about hybrid files which makes situation little bit more > + // complex, because these documents usually contain 2 sections to > + // describe one revision. The last (most recent one) has xref with no > + // entries and trailer has /Prev pointing to the xref table and XRefStm > + // pointing to the xref stream (PDFReference16.pdf: Chapter 3, > + // Compatibility with PDF 1.4). The pervious contains referred xrefs. > + // This, however, doesn't mean that the document has 2 revisions so > + // don't add them in such a case. > + if (!hybrid_xref) > + { > + // process current trailer (store offset and get previous > + // if present) - we are starting with the newest one cloned > + // above > + kernelPrintDbg(DBG_DBG, "XRef offset for "<< > + revisions.size()<<" revision is "<<off); > + > + revisions.insert(revisions.begin(), off); > + } > off = getPrevFromTrailer(trailer); > if(isERR_OFFSET(off)) > // no more previous trailers > break; > + > + hybrid_xref = false; > + Object stm; > + trailer->dictLookupNF("XRefStm", &stm); > + if (stm.isInt()) > + { > + kernelPrintDbg(DBG_INFO, "Document contains hybrid-file XREF."); > + hybrid_xref = true; > + }else if (!stm.isNull()) > + { > + kernelPrintDbg(DBG_WARN, "Unexpected value for XRefStm trailer entry with type " > + << stm.getType()); > + } > + stm.free(); > > // checks whether given offset is already in revisions, which > // would mean corrupted referencies (because of cycle in > // trailers) > - if(std::find(revisions.begin(), revisions.end(), > + if(!hybrid_xref && std::find(revisions.begin(), revisions.end(), > (size_t)off) != revisions.end()) > { > kernelPrintDbg(DBG_ERR, "Trailer Prev points to already" > @@ -755,7 +782,9 @@ void XRefWriter::collectRevisions() > > // off is the first byte of a cross reference section. > // It can be either xref key word or beginning of the > - // xref stream object > + // xref stream object - we have to process it even for > + // hybrid_xref case becase this can contain reference to > + // the previous revision > str->setPos(off); > Object parseObj; > boost::shared_ptr< ::Object> obj(XPdfObjectFactory::getInstance(), xpdf::object_deleter()); > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > Pdfedit-devel mailing list > Pdf...@li... > https://lists.sourceforge.net/lists/listinfo/pdfedit-devel -- Michal Hocko |
From: Jozef M. <mis...@ho...> - 2009-10-22 14:36:49
|
please, commit > Date: Thu, 22 Oct 2009 15:43:22 +0200 > From: ms...@gm... > To: pdf...@li... > Subject: Re: [PATCH] kernel: handle hybrid-xref files correctly in XRefWriter > > On Fri, Oct 09, 2009 at 04:20:41PM +0200, Michal Hocko wrote: > > Hi, > > attached you can find a fix for bt#338. I wouldn't post it normally, > > because it is changes code solely in xrefwriter.cc, but I am not sure > > about its correctness wrt. PDF specification. I would be grateful to have > > another pair of eyes on that. > > > > The affected part of the specification is: Chapter 3, "Compatibility with PDF 1.4" > > and hybrid document files and our multiversion documents handling. > > Anybody willing to go through the specification and check whether this > patch conforms? > If there is nobody to do so I think it doesn't make any sence to wait > longer and I will commit the patch. We can revert it later if it proves > being wrong. > > > > > Let me know if you have any questions. > > > > Thanks > > -- > > Michal Hocko > > > kernel: handle hybrid-xref files correctly in XRefWriter > > > > Fixes bt#338 > > > > PDF 1.5 and above versions of specification support so called hybrid files > > which contain both old-style xref table and xref stream. Such a document > > then looks as follows: > > > > [@OFFSET2] > > xref > > [Standard xref entries. Hidden objects are marked as free with max gen > > number] > > trailer > > << > > /Root [...] > > >> > > > > [@OFFEST3] > > 1 0 obj > > << > > /Length [...] > > >> > > stream > > [object stream data] > > endstream > > > > [@OFFSET1] > > xref > > 0 0 > > trailer > > << > > /Prev OFFSET2 > > /XRefStm OFFSET3 > > >> > > > > startxref > > OFFSET1 > > %EOF > > > > This means that this kind of file looks like a 2 revisions document. This > > wouldn't be a big problem but we have a different problem if we consider > > this to be 2 revisions document. XRefWriter::getRevision size calculates > > revision size which is done as revStart-endPrev (where endPrev is the end > > of the previous revision), but we are not able to calculate end of the > > previous revision because xref (at OFFSET2) is not followed by startxref > > keyword as expected. Application then ends with assertion failure > > revStar>endPrev. > > > > Of course we could fix the XRefWriter::getRevisionEnd but the point is that > > we shouldn't consider this layout multi-version in the first place. The > > "proper" fix is not considering additional xref for hybrid files at all. > > This is done in XRefWriter::collectRevisions by checking XRefStm entry in > > the trailer. > > > > If a hybrid document is multiversion then the referenced trailer (OFFSET2) > > should contain /Prev entry IMO. The PDF specification is not clear in this > > regard. > > > > Index: pdfedit-patches/src/kernel/xrefwriter.cc > > =================================================================== > > --- pdfedit-patches.orig/src/kernel/xrefwriter.cc 2009-10-09 15:12:37.000000000 +0200 > > +++ pdfedit-patches/src/kernel/xrefwriter.cc 2009-10-09 15:37:53.000000000 +0200 > > @@ -728,23 +728,50 @@ void XRefWriter::collectRevisions() > > return; > > } > > > > + bool hybrid_xref = false; > > do > > { > > - // process current trailer (store offset and get previous > > - // if present) - we are starting with the newest one cloned > > - // above > > - kernelPrintDbg(DBG_DBG, "XRef offset for "<< > > - revisions.size()<<" revision is "<<off); > > - revisions.insert(revisions.begin(), off); > > + // Take care about hybrid files which makes situation little bit more > > + // complex, because these documents usually contain 2 sections to > > + // describe one revision. The last (most recent one) has xref with no > > + // entries and trailer has /Prev pointing to the xref table and XRefStm > > + // pointing to the xref stream (PDFReference16.pdf: Chapter 3, > > + // Compatibility with PDF 1.4). The pervious contains referred xrefs. > > + // This, however, doesn't mean that the document has 2 revisions so > > + // don't add them in such a case. > > + if (!hybrid_xref) > > + { > > + // process current trailer (store offset and get previous > > + // if present) - we are starting with the newest one cloned > > + // above > > + kernelPrintDbg(DBG_DBG, "XRef offset for "<< > > + revisions.size()<<" revision is "<<off); > > + > > + revisions.insert(revisions.begin(), off); > > + } > > off = getPrevFromTrailer(trailer); > > if(isERR_OFFSET(off)) > > // no more previous trailers > > break; > > + > > + hybrid_xref = false; > > + Object stm; > > + trailer->dictLookupNF("XRefStm", &stm); > > + if (stm.isInt()) > > + { > > + kernelPrintDbg(DBG_INFO, "Document contains hybrid-file XREF."); > > + hybrid_xref = true; > > + }else if (!stm.isNull()) > > + { > > + kernelPrintDbg(DBG_WARN, "Unexpected value for XRefStm trailer entry with type " > > + << stm.getType()); > > + } > > + stm.free(); > > > > // checks whether given offset is already in revisions, which > > // would mean corrupted referencies (because of cycle in > > // trailers) > > - if(std::find(revisions.begin(), revisions.end(), > > + if(!hybrid_xref && std::find(revisions.begin(), revisions.end(), > > (size_t)off) != revisions.end()) > > { > > kernelPrintDbg(DBG_ERR, "Trailer Prev points to already" > > @@ -755,7 +782,9 @@ void XRefWriter::collectRevisions() > > > > // off is the first byte of a cross reference section. > > // It can be either xref key word or beginning of the > > - // xref stream object > > + // xref stream object - we have to process it even for > > + // hybrid_xref case becase this can contain reference to > > + // the previous revision > > str->setPos(off); > > Object parseObj; > > boost::shared_ptr< ::Object> obj(XPdfObjectFactory::getInstance(), xpdf::object_deleter()); > > > ------------------------------------------------------------------------------ > > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > > is the only developer event you need to attend this year. Jumpstart your > > developing skills, take BlackBerry mobile applications to market and stay > > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > > http://p.sf.net/sfu/devconference > > > _______________________________________________ > > Pdfedit-devel mailing list > > Pdf...@li... > > https://lists.sourceforge.net/lists/listinfo/pdfedit-devel > > > -- > Michal Hocko > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > Pdfedit-devel mailing list > Pdf...@li... > https://lists.sourceforge.net/lists/listinfo/pdfedit-devel _________________________________________________________________ Windows 7: It helps you do more. Explore Windows 7. http://www.microsoft.com/Windows/windows-7/default.aspx?ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen3:102009 |
From: Michal H. <ms...@gm...> - 2009-10-22 14:52:20
|
On Thu, Oct 22, 2009 at 02:36:38PM +0000, Jozef Misutka wrote: > > please, commit Ok, then. I would be much calmer if somebody did a serious review, though. > > On Fri, Oct 09, 2009 at 04:20:41PM +0200, Michal Hocko wrote: > > > Hi, > > > attached you can find a fix for bt#338. I wouldn't post it normally, > > > because it is changes code solely in xrefwriter.cc, but I am not sure > > > about its correctness wrt. PDF specification. I would be grateful to have > > > another pair of eyes on that. > > > > > > The affected part of the specification is: Chapter 3, "Compatibility with PDF 1.4" > > > and hybrid document files and our multiversion documents handling. > > > > Anybody willing to go through the specification and check whether this > > patch conforms? > > If there is nobody to do so I think it doesn't make any sence to wait > > longer and I will commit the patch. We can revert it later if it proves > > being wrong. > > > > > > > > Let me know if you have any questions. > > > > > > Thanks > > > -- > > > Michal Hocko > > > > > kernel: handle hybrid-xref files correctly in XRefWriter > > > > > > Fixes bt#338 > > > > > > PDF 1.5 and above versions of specification support so called hybrid files > > > which contain both old-style xref table and xref stream. Such a document > > > then looks as follows: > > > > > > [@OFFSET2] > > > xref > > > [Standard xref entries. Hidden objects are marked as free with max gen > > > number] > > > trailer > > > << > > > /Root [...] > > > >> > > > > > > [@OFFEST3] > > > 1 0 obj > > > << > > > /Length [...] > > > >> > > > stream > > > [object stream data] > > > endstream > > > > > > [@OFFSET1] > > > xref > > > 0 0 > > > trailer > > > << > > > /Prev OFFSET2 > > > /XRefStm OFFSET3 > > > >> > > > > > > startxref > > > OFFSET1 > > > %EOF > > > > > > This means that this kind of file looks like a 2 revisions document. This > > > wouldn't be a big problem but we have a different problem if we consider > > > this to be 2 revisions document. XRefWriter::getRevision size calculates > > > revision size which is done as revStart-endPrev (where endPrev is the end > > > of the previous revision), but we are not able to calculate end of the > > > previous revision because xref (at OFFSET2) is not followed by startxref > > > keyword as expected. Application then ends with assertion failure > > > revStar>endPrev. > > > > > > Of course we could fix the XRefWriter::getRevisionEnd but the point is that > > > we shouldn't consider this layout multi-version in the first place. The > > > "proper" fix is not considering additional xref for hybrid files at all. > > > This is done in XRefWriter::collectRevisions by checking XRefStm entry in > > > the trailer. > > > > > > If a hybrid document is multiversion then the referenced trailer (OFFSET2) > > > should contain /Prev entry IMO. The PDF specification is not clear in this > > > regard. > > > > > > Index: pdfedit-patches/src/kernel/xrefwriter.cc > > > =================================================================== > > > --- pdfedit-patches.orig/src/kernel/xrefwriter.cc 2009-10-09 15:12:37.000000000 +0200 > > > +++ pdfedit-patches/src/kernel/xrefwriter.cc 2009-10-09 15:37:53.000000000 +0200 > > > @@ -728,23 +728,50 @@ void XRefWriter::collectRevisions() > > > return; > > > } > > > > > > + bool hybrid_xref = false; > > > do > > > { > > > - // process current trailer (store offset and get previous > > > - // if present) - we are starting with the newest one cloned > > > - // above > > > - kernelPrintDbg(DBG_DBG, "XRef offset for "<< > > > - revisions.size()<<" revision is "<<off); > > > - revisions.insert(revisions.begin(), off); > > > + // Take care about hybrid files which makes situation little bit more > > > + // complex, because these documents usually contain 2 sections to > > > + // describe one revision. The last (most recent one) has xref with no > > > + // entries and trailer has /Prev pointing to the xref table and XRefStm > > > + // pointing to the xref stream (PDFReference16.pdf: Chapter 3, > > > + // Compatibility with PDF 1.4). The pervious contains referred xrefs. > > > + // This, however, doesn't mean that the document has 2 revisions so > > > + // don't add them in such a case. > > > + if (!hybrid_xref) > > > + { > > > + // process current trailer (store offset and get previous > > > + // if present) - we are starting with the newest one cloned > > > + // above > > > + kernelPrintDbg(DBG_DBG, "XRef offset for "<< > > > + revisions.size()<<" revision is "<<off); > > > + > > > + revisions.insert(revisions.begin(), off); > > > + } > > > off = getPrevFromTrailer(trailer); > > > if(isERR_OFFSET(off)) > > > // no more previous trailers > > > break; > > > + > > > + hybrid_xref = false; > > > + Object stm; > > > + trailer->dictLookupNF("XRefStm", &stm); > > > + if (stm.isInt()) > > > + { > > > + kernelPrintDbg(DBG_INFO, "Document contains hybrid-file XREF."); > > > + hybrid_xref = true; > > > + }else if (!stm.isNull()) > > > + { > > > + kernelPrintDbg(DBG_WARN, "Unexpected value for XRefStm trailer entry with type " > > > + << stm.getType()); > > > + } > > > + stm.free(); > > > > > > // checks whether given offset is already in revisions, which > > > // would mean corrupted referencies (because of cycle in > > > // trailers) > > > - if(std::find(revisions.begin(), revisions.end(), > > > + if(!hybrid_xref && std::find(revisions.begin(), revisions.end(), > > > (size_t)off) != revisions.end()) > > > { > > > kernelPrintDbg(DBG_ERR, "Trailer Prev points to already" > > > @@ -755,7 +782,9 @@ void XRefWriter::collectRevisions() > > > > > > // off is the first byte of a cross reference section. > > > // It can be either xref key word or beginning of the > > > - // xref stream object > > > + // xref stream object - we have to process it even for > > > + // hybrid_xref case becase this can contain reference to > > > + // the previous revision > > > str->setPos(off); > > > Object parseObj; > > > boost::shared_ptr< ::Object> obj(XPdfObjectFactory::getInstance(), xpdf::object_deleter()); > > > > > ------------------------------------------------------------------------------ > > > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > > > is the only developer event you need to attend this year. Jumpstart your > > > developing skills, take BlackBerry mobile applications to market and stay > > > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > > > http://p.sf.net/sfu/devconference > > > > > _______________________________________________ > > > Pdfedit-devel mailing list > > > Pdf...@li... > > > https://lists.sourceforge.net/lists/listinfo/pdfedit-devel > > > > > > -- > > Michal Hocko > > > > ------------------------------------------------------------------------------ > > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > > is the only developer event you need to attend this year. Jumpstart your > > developing skills, take BlackBerry mobile applications to market and stay > > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > > http://p.sf.net/sfu/devconference > > _______________________________________________ > > Pdfedit-devel mailing list > > Pdf...@li... > > https://lists.sourceforge.net/lists/listinfo/pdfedit-devel > > _________________________________________________________________ > Windows 7: It helps you do more. Explore Windows 7. > http://www.microsoft.com/Windows/windows-7/default.aspx?ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen3:102009 -- Michal Hocko |