On Tue, Jul 26, 2011 at 12:15:13PM +0200, Michal Hocko wrote:
> I have looked at the document and at the first glance the update hasn't
> screwed anything obvious.
> At first I thought that we haven't updated the number of objects (stored
> in the Xref stream in the original revision and Trailer in the new
> revision) because those numbers are same for both while we have
> obviously added new objects. This turned out to be OK because object
> numbers are sparse and we are reusing those numbers which are not
> Then I have looked at the Root object which is reported to be missing
> and this started to look interesting.
> Original revision reports:
> 825 0 obj <<
> /Type /XRef
> /Index [0 826]
> /Size 826
> /W [1 3 1]
> /Root 823 0 R
> /Info 824 0 R
> /ID [<9B0D6E3CC66605F7CE12FB9EAAB1356F>
> /Length 2230
> /Filter /FlateDecode
> and the new one:
> /Size 826
> /Root 823 0 R
> /Info 824 0 R
> /ID [ <9b0d6e3cc66605f7ce12fb9eaab1356f>
> <9b0d6e3cc66605f7ce12fb9eaab1356f> ]
> /Prev 773827
> It is an object with reference number [823 0]. The problem is that I
> cannot see that object in the file:
> $ grep --binary-files=text "823 0 obj" eflow2.pdf
> I guess that it is just embeded somewhere because I can see it with our
> ./toos/pdf_object_printer --ref "823 0" --file ~/tmp/eflow2.pdf
> Document: "/home/miso/tmp/eflow2.pdf"
> [823 0]:
> /Type /Catalog
> /Pages 800 0 R
> /Outlines 801 0 R
> /Names 822 0 R
> /PageMode /UseOutlines
> /PageLabels <<
> /Nums [ 0 <<
> /S /D
> >> 1 <<
> /S /D
> >> ]
> /OpenAction 30 0 R
So the Catalog object [823 0] is really compressed in ObjStm (stream
object) [815 0] which looks as follows (I have skipped objects that are
of no interest at the moment):
$ ./tools/pdf_object_printer --ref "815 0" --decode 1 --file ~/tmp/eflow2.pdf
814 0 816 153 817 314 818 398 819 492 820 584 821 651 822 721 823 742
/Pages 800 0 R
/Outlines 801 0 R
/Names 822 0 R
/OpenAction 30 0 R
The xref table which defines your change looks like:
0000776307 00000 n
0000776400 00000 n
0000779952 00000 n
0000782990 00000 n
0000786170 00000 n
0000786290 00000 n
No section refers to the object 823.
So what could be wrong? My gut feeling says me that Acrobat is "buggy"
here. All the above is saying that all new objects have been added
correctly and the document structure is accessible.
The problem seems to be that the original revision uses cross reference
stream while the incremental update uses xref table. This is perfectly
legal according to PDF specification AFAIU.
PDFedit as well as other code based on the original xpdf code (same with
poppler) parses all cross reference tables/streams first so we know
where all objects are stored. We do not care much about xref tables vs.
streams because that is handled when an indirect object is referenced.
I guess that Acrobat is complaining because the Root [823 0] object is
a part of object stream that is not immediately visible from the xref
table directly. Whether this is complying to the specification is not
100% clear to me.
Specification says (3.4.6 Object Streams):
Indirect references to objects inside object streams use the normal
syntax: for example, 14 0 R. Access to these objects requires a
different way of storing cross-reference information; see Section 3.4.7,
“Cross-Reference Streams.” Although an application must support PDF
1.5 to use compressed objects, the objects can be stored in a manner
that is compatible with PDF 1.4. Applications that do not support PDF
1.5 can ignore the objects; see “Compatibility with PDF 1.4” on page
As you can see there _is_ a cross reference stream for this object.
A section about incremental update says (3.4.5 Incremental Updates):
In an incremental update, any new or changed objects are appended to
the file, a cross-reference section is added, and a new trailer is
inserted. The resulting file has the structure shown in Figure 3.3. A
complete example of an updated file is shown in Section G.6, “Updating
The cross-reference section added when a file is updated contains
entries only for objects that have been changed, replaced, or
deleted. Deleted objects are left unchanged in the file, but are marked
as deleted by means of their cross-reference entries. The added trailer
contains all the entries (perhaps modified) from the previous trailer,
as well as a Prev entry giving the location of the previous cross-
reference section (see Table 3.13 on page 73). As shown in Figure 3.3, a
file that has been updated several times contains several trailers; each
trailer is terminated by its own end-of-file (%%EOF) marker.
There are no restrictions about combining xref stream vs. table
OK, enough lawyering here. I would try to use a newer Acroread (mine is
9.2 and it is affected as well) or report that to Acrobat or use PDFedit
to flatten the file (this will create a new document with all reachable
object with a xref table and then you can update it without issues).
Hope it will help.