Addition of Named Destinations

Help
2012-03-07
2013-01-26
  • Andreas Pinter
    Andreas Pinter
    2012-03-07

    Hi there,

    I think there is a mismatch between pdfclown and adobe acrobat concerning named destination.
    When adding those like shown in 'NamedDestinationSample' I can easily find them in the resulting PDF using a plain text editor.
    But using adobe acrobat 9 I do not get any named destinations.

    Is Clown missing something or is adobe using some different way to determine named destinations?

    Steps to reproduce:
    1.) Start CLI
    2.) Choose 'NamedDestinationSample'
    3.) Choose 'whyOpenSourceMakeSense' (this document do not have any named destinations in advance)
    4.) Choose 'Standard'
    5.) Open the resulting PDF with a text editor. Find /Names which should look similar to this:
    <</Names  /Limits  >>
    6.) Open the resulting PDF with Adobe Acrobat. Open View > Navigation Panels > Destinations
    -> it is empty.

    Greetings,
    - Andreas

     
  • I don't know - that (leaf) node seems apparently correct… could you please compare its syntax with another file which can be properly opened instead?

    thank you
    Stefano

     
  • Andreas Pinter
    Andreas Pinter
    2012-03-14

    Hi again,

    The following is the Names/Dests dictionary before changing my testfile:

    /Names<</Dests 685 0 R
    >>
    ...
    685 0 obj
    <</Kids[ 680 0 R 681 0 R 682 0 R 683 0 R 684 0 R]
    >>
    ...
    684 0 obj
    <</Limits[(d31e1331)(lastTocPage)]
    /Names[(d31e1331) 539 0 R
    (d31e1342) 579 0 R
    (d31e968) 180 0 R
    (d31e971) 186 0 R
    (d31e977) 195 0 R
    (hico_last_page) 627 0 R
    (lastTocPage) 112 0 R
    ]
    >>
    

    And now after using the NamedDestinationSample:

    /Names <</Dests 685 0 R >>
    ...
    685 0 obj
    <</Kids [680 0 R 681 0 R 695 0 R 682 0 R 683 0 R 684 0 R ] /Limits [(1845494305) (Third page) ] >>
    endobj
    ...
    684 0 obj
    <</Limits [(d31e1331) (Third page) ] /Names [(d31e1331) 539 0 R (d31e1342) 579 0 R (d31e968) 180 0 R (d31e971) 186 0 R (d31e977) 195 0 R (hico_last_page) 627 0 R (lastTocPage) 112 0 R (Second page) 696 0 R (Third page) 697 0 R ] >>
    endobj
    

    I seems to me that the only difference is the use of /Limits in the "root node".
    When removing these with an texteditor Adobe Acrobat is happily showing all destinations (even the newly added ones).
    Digging through the PDF reference(1.6) I found the following explanation for /Limits:

    (Intermediate and leaf nodes only; required) An array of two strings, specifying the (lexically)
    least and greatest keys included in the Names array of a leaf node or in the Names
    arrays of any leaf nodes that are descendants of an intermediate node.

     
  • Andreas Pinter
    Andreas Pinter
    2012-03-14

    The problem is fixable by adding a boolean parameter to NameTree.Add()

    private void Add(
          PdfString key,
          TValue value,
          bool overwrite,
          PdfDictionary node,
          bool isRoot
          )
    

    and depending on this parameter the function "UpdateNodeLimits" is called or ignored.
    Since this method is private and only called by PdfClown it is possible to determine if we are @root or not.

    Greetings,
    - Andreas

     
  • Hi Andreas,

    well done! I'll include your fix both in the 0.1.1 branch and in the trunk (by the way, I'm currently refactoring the NameTree class moving its implementation into a generalized Tree superclass in order to support NumberTree objects too).

    thank you
    Stefano

     
  • Andreas Pinter
    Andreas Pinter
    2012-03-21

    I'm still struggling to get the links working.

    I'm currently in the strange situation, that importing /Annots and /Dests seem to work pretty well IF the name of the destination is "short enough".

    My original document is using destination names of the form 'N49feace3_fig-0001-gra-0001-hot-0003'. In the original document the links are just fine. As soon as I merge them into the other document, those links are broken.
    If I use Adobe Acrobat to change the destination name in the original document to '3_fig-0001-gra-0001-hot-0003' and change the GoTo Action as well the link works in both documents.
    As soon as I switch it back the resulting document is broken again.

    Unfortunately it can't really be about length of the destination name, because another example is 'N84afaba6_fig-0001' which is clearly shorter than '3_fig-0001-gra-0001-hot-0003' but does not work unless changed to something like '6_fig-0001'.

    I'm kind of confused here and hope you can give me some hint about why the name of the destination could be of any importance other than matching GoTo Action and Named Destination.

    Greetings,
    - Andreas

     
  • Andreas Pinter
    Andreas Pinter
    2012-03-21

    I'm sorry to 'spam' here, but usually when I ask for some help I rethink over the whole problem to formulate it as clear as possible. While doing so I come up with a new idea to try out.

    Long story short: I think I found the problem or at least its origin.
    The above problem seem to be located in the NameTree class and its methods to split a specific node (because it is full). I just changed the 'NodeMinSize' from 5 to 200 and all my links work perfectly well. So I think the reorganization of /Dests leads to Adobe Acrobat not finding the destination mentioned in the GoTo action.

    So you might look into that, when refactoring the NameTree class.
    If necessary I can provide the files I am using.

    Greetings,
    - Andreas

     
  • In order to focus on the actual problem avoiding any noise/redundancy, it's fundamental that you reduce your sample file to the minimum specimen behaving that way. When ready, please send it to me or attach it to a bug tracker report along with its source file to let me compare their structures.

    thank you
    Stefano

     
  • Andreas Pinter
    Andreas Pinter
    2012-03-27

    I tried to open a bug tracker report, but the filesize is limited.
    And my mail returned with the error that your mailbox is full.

    So here is a link: http://www.2shared.com/file/4w3wY7B8/nameTreeBugExamples.html
    If you aren't comfortable with this type of shareing the files, tell me as soon as your mailbox has some space again.

    And here is the accompanying text:
    As requested through the help forum here are my testfiles.

    The idea is to insert \'xxx-inserted.pdf\' into \'original.pdf\' at page 3.
    Ideally all the links from the original and the inserted pdf work.
    On page 11 of \'xxx-resul.pdf\' you\'ll find a two links to \"Fig. 2\", which should jump to the second figure (Page 12).
    The only difference between \'good-result.pdf\' and \'wrong-result.pdf\' is an additional destination in the document, not involved in this linking process at all.

    My original \'inserted.pdf\' has way more destinations, but I removed them one by one through Adobe Acrobat until an additional deletion would make the result good again.

    I hope this is sufficient to find the bug. Otherwise let me know.

    Greetings,
    - Andreas

     
  • I examined the resulting files (good-result.pdf and wrong-result.pdf), but their comparison seemed not to reveal any apparent violation to the PDF spec:

    * good-result.pdf:

    1650 0 obj
    <</Names [(1845505298) [1309 0 R /XYZ null 34.5952 null ] (d31e1036) [1405 0 R /XYZ null 372.319 null ] (d31e1060) [1405 0 R /XYZ null 302.58 null ] (d31e1117) [1531 0 R /XYZ null 767.796 null ] (d31e1142) [1531 0 R /XYZ null 674.944 null ] (d31e1296) [1531 0 R /XYZ null 546.326 null ] (d31e968) [1405 0 R /XYZ null 699.87 null ] (d31e971) [1405 0 R /XYZ null 680.941 null ] (d31e977) [1405 0 R /XYZ null 659.024 null ] (N84afaba6_fig-0001) [1563 0 R /XYZ null 222.341 null ] ] >>
    

    * wrong-result.pdf:

    1650 0 obj
    <</Kids [1652 0 R 1651 0 R ] >>
    endobj
    1651 0 obj
    <</Names [(d31e1142) [1531 0 R /XYZ null 674.944 null ] (d31e1296) [1531 0 R /XYZ null 546.326 null ] (d31e968) [1405 0 R /XYZ null 699.87 null ] (d31e971) [1405 0 R /XYZ null 680.941 null ] (d31e977) [1405 0 R /XYZ null 659.024 null ] (N84afaba6_fig-0001) [1563 0 R /XYZ null 222.341 null ] ] /Limits [(d31e1142) (N84afaba6_fig-0001) ] >>
    endobj
    1652 0 obj
    <</Names [(1845505298) [1309 0 R /XYZ null 34.5952 null ] (d31e1036) [1405 0 R /XYZ null 372.319 null ] (d31e1051) [1405 0 R /XYZ null 308.458 null ] (d31e1060) [1405 0 R /XYZ null 302.58 null ] (d31e1117) [1531 0 R /XYZ null 767.796 null ] ] /Limits [(1845505298) (d31e1117) ] >>
    endobj
    

    I have no clues.

     
  • Andreas Pinter
    Andreas Pinter
    2012-04-03

    I was just writing a response, where I tell you that I haven't found anything, when it hit me.

    According to PDF 1.6 Spec, page 135, chapter 3.8.5 Name Trees:
    "The Names entries in the leaf (or root) nodes contain the tree’s keys and their associated values, arranged in
    key-value pairs and sorted lexically in ascending order by key. "

    As far as I know, lexically means, that the ASCII code of the single characters is used for sorting. And in the ASCII table the upper case letters have lower numbers than the lower case letters. So 'N84afaba6_fig-0001' should be before all the 'd31e***' destinations.
    This is wrong in both PDFs. But since there are no /Limits in 'good-result.pdf' the viewer just searches the whole dictionary and finds 'N84afaba6_fig-0001'. In 'wrong-result.pdf' the Destinations are splittet and so the viewer tries to find the destination using /Limits. In the above example he is looking into object 1652 and can't find the destination.

    Below you find the name tree which works perfectly fine with adobe acrobat:

    1650 0 obj
    <</Kids [1652 0 R 1651 0 R ] >>
    endobj
    1651 0 obj
    <</Names [(d31e1142) [1531 0 R /XYZ null 674.944 null ] (d31e1296) [1531 0 R /XYZ null 546.326 null ] (d31e968) [1405 0 R /XYZ null 699.87 null ] (d31e971) [1405 0 R /XYZ null 680.941 null ] (d31e977) [1405 0 R /XYZ null 659.024 null ] ] /Limits [(d31e1142) (d31e977) ] >>
    endobj
    1652 0 obj
    <</Names [(1845505298) [1309 0 R /XYZ null 34.5952 null ] (N84afaba6_fig-0001) [1563 0 R /XYZ null 222.341 null ] (d31e1036) [1405 0 R /XYZ null 372.319 null ] (d31e1051) [1405 0 R /XYZ null 308.458 null ] (d31e1060) [1405 0 R /XYZ null 302.58 null ] (d31e1117) [1531 0 R /XYZ null 767.796 null ] ] /Limits [(1845505298) (d31e1117) ] >>
    endobj
    
     
  • You are right: while the Java implementation was correctly applying the lexical sorting, the .NET's was wrongly applying the alphabetical one. I'm going to include this fix into the next release.

    thank you
    Stefano

     
  • Andreas Pinter
    Andreas Pinter
    2012-04-11

    You may also look into the following issue:
    When extracting specific names (destinations in my case) with the NameTree index operator I ran into the problem, that my name wasn't returned although it is present.
    The origin of this issue is the

    if( low >= high)
    

    check for the leaf nodes.

    Take the following 'names' array and search for 'N72722300'

    "[ (N64a8fab2) [ 2281 0 R XYZ null 443,313 null ] (N64a8fab2_fig-0001) [ 2429 0 R XYZ null 227,323 null ] (N700b08e3) [ 1487 0 R XYZ null 752,971 null ] (N72722300) [ 1487 0 R XYZ null 781,166 null ] (N8095d719) [ 1487 0 R XYZ null 781,166 null ] ]"
    

    The variable 'low' and the variable 'high' will both be '6' (which would be the correct location for the searched name) but the search is aborted before checking if the searched name is at indet '6'.

    Greetings
    - Andreas

     
  • Andreas Pinter
    Andreas Pinter
    2012-08-14

    Hi there once again,

    The above example is now working perfectly fine with the new Tree class. But I have one more thing to mention about NamedDestinations.
    According to PDF 1.6 Spec, page 135, chapter 3.8.5 Name Trees:

    The tree always has exactly one root node, which contains a single entry: either Kids or Names but not both.

    It does not say anything about a root node, containing no entries at all. Unfortunately it is possible to create such files (e.g. through Adobe Acrobat 9.5.1 Pro):

    1296 0 obj
    <</Dests 1297 0 R>>
    endobj
    1297 0 obj
    <<>>
    endobj
    

    This leads to a "Malformed tree node." - Exception in PDFClown ( Tree.Get(PdfDict, PdfName) ) when trying to add new destinations programmatically.
    Since it is not explicitly forbidden in the spec I would suggest to handle such a case somehow. It may be possible to add an empty /Names element instead of throwing an exception?

    Greetings
    - Andreas