From: Michal H. <ms...@gm...> - 2011-08-23 06:24:48
|
On Thu, Aug 11, 2011 at 09:36:53AM +0200, Michal Hocko wrote: > On Fri, Aug 05, 2011 at 01:05:06PM +0200, Michal Hocko wrote: > > On Fri, Aug 05, 2011 at 12:19:52PM +0200, Jozef Misutka wrote: > > > Dne 4.8.2011 22:54, Michal Hocko napsal(a): > > > >On Thu, Aug 04, 2011 at 10:28:50PM +0200, Michal Hocko wrote: > > > >>On Tue, Aug 02, 2011 at 11:03:46PM +0200, Jozef Misutka wrote: > > > >[...] > > > >>>The original pdf is also corrupted. If you print the object 7665 0: > > > >>> > > > >>>... > > > >>>/IDTree 7666 0 R > > > >>>/K 7667 0 R > > > >>>/ParentTree 7668 0 R > > > >>>/ParentTreeNextKey 4718 > > > >>>/RoleMap<< > > > >>>/1Heading /P > > > >>>/1Heading (appx) /P<----------------- > > > >>>/2Heading /P > > > >>>/2Heading (appx) /P<----------------- > > > >>>/3Heading /P > > > >>>/4Heading /P > > > >>>... > > > >>> > > > >>>Rolemap is a dictionary object so it expects key/value pairs, it gets > > > >>>1Heading: P (ok) > > > >>>1Heading: apps (ok) > > > >>>but it seems strange at this point because the next key is > > > >>>P: 2Heading (sill ok, however strange) > > > >>>P: 2Heading (sill ok, however strange) > > > >>>and then it gets a string instead of name object (appx) > > > >>Not exactly. This is how we print that object. In fact the entry is > > > >>"/1Heading (appx)" as a key and "/P" as a value. Name object cannot > > > >>contain any spaces but it might contain numeric representation of that > > > >>character. Xpdf code turns those numeric rep. into characters. > > > >> > > > >>I am wondering how we ended up with a plain space there as > > > >>makeNamePdfValid substitutes all dangerous characters by #CODE so we > > > >>should see /1Heading#20(appx) in the file. This sounds pretty much like > > > >>a bug in our code. > > > >OK, I've got it. The problem is in complexValueToString<CDict> which > > > >puts key blindly without any validation. The patch to fix this is > > > >attached. > > > > > > Did you really tried the patch with the specific file from my email? > > > > As I wrote I had issues to build gui so I haven't tried it. But it > > should fix the issue with the spaces in key values. > > Sorry it took so long but I've been busy recently. > I have looked at Lexer and how it handles name objects and found out > that the patch I have sent last time is not sufficient (albeit we need it). > The other problem is in makeNamePdfValid itself. > We follow a PDF specification recommendation to represent all characters > out of range [~, !] as #hexa numbers. This is not sufficient, though > (just consider `(', `)' and there are others). The attached patch fixes > that. > > I've tested both patches by delinearizator and pdf_object_printer tools > as I am still not able to build gui now. > > Both patches are attached for reference. Did you have time to look at those patches? Should I push them to the repository? -- Michal Hocko |