Menu

Trouble reading pagelabel dictionary from existing PDF

Help
Jamie
2014-10-12
2015-04-17
  • Jamie

    Jamie - 2014-10-12

    I’m new to PDFClown and I’m having trouble getting the pagelabel dictionary out of an existing file. I started with some of the code in the pagelabel example, converted it to VB.net, and I get a null reference error in the line that iterates through the pagelabel key/value pairs. I copied the very similar code from the parse example that gets the meta data and it works fine. Getting the number of pages also works. Am I misunderstanding something about how this needs to be done?

    Here is what I have. I’m working from memory here, I don’t have my code in front of me.

    :::Visual Basic.NET
    Imports org.pdfclown.documents
    Imports org.pdfclown.documents.contents
    Imports org.pdfclown.documents.contents.composition
    Imports org.pdfclown.documents.contents.objects
    Imports org.pdfclown.documents.interaction.navigation.page
    Imports org.pdfclown.files
    Imports org.pdfclown.objects

    Imports System.Collections.Generic
    Imports System.Drawing

    Public Class PageLabelSample
        Inherits Sample
        Public Sub Run()
    
                ' 1. Opening the PDF file...
                Dim filePath As String = PromptFileChoice("Please select a PDF file")
                Using myfile As New File(filePath)
                    Dim mydocument As Document = myfile.Document
    
                    ' 2. Defining the page labels...
                    Dim mypageLabels As PageLabels = mydocument.PageLabels
    
                        ' The next line generates the error
                        For Each entry As KeyValuePair(Of PdfInteger, PageLabel) In mypageLabels
    
                        Console.WriteLine("Page label " + entry.Value.BaseObject)
                        Console.WriteLine("    Initial page: " + (entry.Key.IntValue + 1))
                        Console.WriteLine("    Prefix: " + (entry.Value.Prefix))
                        Console.WriteLine("    Number style: " + (entry.Value.NumberStyle))
                        Console.WriteLine("    Number base: " + (entry.Value.NumberBase))
    
                End Using
    
        End Sub
    End Class
    

    :::
    When the error happens, and I look at the value of mypageLabels in the debugger, it seems like it has a lot of unassigned or empty parts. I know that the PDF I’m using has a page label dictionary.
    I saw the clone and wrap methods in the api help, but I don’t know if they are applicable.

    Thanks for your help.
    Jamie

     
  • Jamie

    Jamie - 2014-10-26

    Well, I downloaded a fresh copy of the files (vers 0.1.2-Beta) and tried the sample files without any modification in both java and .net. The java version of the PageLabelSample works correctly. The .net version does not.

    It produced the following error:

    ...
    [0] Standard serialization
    [1] Incremental update
    Please select a serialization mode: 0
    
    Output: ../../output/PageLabelSample.Standard.pdf
    An exception happened while running the sample:
    System.NullReferenceException: Object reference not set to an instance of an object
      at org.pdfclown.objects.Tree`2+Enumerator[org.pdfclown.objects.PdfInteger,org.pdfclown.documents.interaction.navigation.page.PageLabel]..ctor (org.pdfclown.objects.Tree`2 tree) [0x00072] in /Users/macuser/PDFClown/dotNET/pdfclown.lib/src/org/pdfclown/objects/Tree.cs:220 
      at org.pdfclown.objects.Tree`2[org.pdfclown.objects.PdfInteger,org.pdfclown.documents.interaction.navigation.page.PageLabel].GetEnumerator () [0x00002] in /Users/macuser/PDFClown/dotNET/pdfclown.lib/src/org/pdfclown/objects/Tree.cs:770 
      at org.pdfclown.samples.cli.PageLabelSample.Run () [0x000d7] in /Users/macuser/PDFClown/dotNET/pdfclown.samples.cli/src/org/pdfclown/samples/cli/PageLabelSample.cs:54 
      at org.pdfclown.samples.cli.SampleLoader.Run (System.String inputPath, System.String outputPath) [0x00155] in /Users/macuser/PDFClown/dotNET/pdfclown.samples.cli/src/org/pdfclown/samples/cli/SampleLoader.cs:123 
    

    The program does produce the correct output pdf with the page labels assigned. It then fails when trying to read the page labels out of the just written file.
    It fails on line 54:
    foreach(KeyValuePair<PdfInteger,PageLabel> entry in file.Document.PageLabels)

    If I insert a break point before it crashes, and look at the values of the page label object I get this:

    pageLabels          {org.pdfclown.documents.PageLabels}                                                                                                                                                         org.pdfclown.documents.PageLabels
    base                {org.pdfclown.objects.NumberTree<org.pdfclown.documents.interaction.navigation.page.PageLabel>}                                                                                             org.pdfclown.objects.NumberTree<org.pdfclown.documents.interaction.navigation.page.PageLabel>
    base                {org.pdfclown.objects.Tree<org.pdfclown.objects.PdfInteger,org.pdfclown.documents.interaction.navigation.page.PageLabel>}                                                                   org.pdfclown.objects.Tree<org.pdfclown.objects.PdfInteger,org.pdfclown.documents.interaction.navigation.page.PageLabel>
    base                {org.pdfclown.objects.PdfObjectWrapper<org.pdfclown.objects.PdfDictionary>}                                                                                                                 org.pdfclown.objects.PdfObjectWrapper<org.pdfclown.objects.PdfDictionary>
    base                {org.pdfclown.objects.PdfObjectWrapper}                                                                                                                                                     org.pdfclown.objects.PdfObjectWrapper
    BaseObject          {<< Nums [ 0 324 0 R 3 325 0 R 6 326 0 R ] Limits [ 0 6 ] Metadata 0 0 R >>}                                                                                                                org.pdfclown.objects.PdfDictionary
    Container           {322 0 obj << Type Catalog Pages 321 0 R PTEX.Fullbanner (This is pdfTeX, Version 3.141592-1.10b) PageLabels << Nums [ 0 324 0 R 3 325 0 R 6 326 0 R ] Limits [ 0 6 ] Metadata 0 0 R >> >>} org.pdfclown.objects.PdfIndirectObject
    DataContainer       {322 0 obj << Type Catalog Pages 321 0 R PTEX.Fullbanner (This is pdfTeX, Version 3.141592-1.10b) PageLabels << Nums [ 0 324 0 R 3 325 0 R 6 326 0 R ] Limits [ 0 6 ] Metadata 0 0 R >> >>} org.pdfclown.objects.PdfIndirectObject
    Document            {org.pdfclown.documents.Document}                                                                                                                                                           org.pdfclown.documents.Document
    File                {org.pdfclown.files.File}                                                                                                                                                                   org.pdfclown.files.File
    Non-public members                                                                                                                                                                                              
    BaseDataObject      {<< Nums [ 0 324 0 R 3 325 0 R 6 326 0 R ] Limits [ 0 6 ] >>}                                                                                                                               org.pdfclown.objects.PdfDictionary
    Metadata            {org.pdfclown.documents.interchange.metadata.Metadata}                                                                                                                                      org.pdfclown.documents.interchange.metadata.Metadata
    Non-public members                                                                                                                                                                                              
    Count               System.NullReferenceException: Object reference not set to an instance of an object                                                                                                         
    IsReadOnly          false                                                                                                                                                                                       bool
    Keys                System.NullReferenceException: Object reference not set to an instance of an object                                                                                                         
    Values              System.NullReferenceException: Object reference not set to an instance of an object                                                                                                         
    

    You can see that the container is the document catalog which contains the pagelabel dictionary. The Pagelabel dictionary refers to 3 other objects (324, 325, and 326) which appear to be valid page label entries.

    However, you can also see that the count, keys, and values are all undefined. Does this indicate that the code that parses the page label dictionary is not working correctly?

    I hope I have provided enough information to help track this down.

    Thanks for your help.
    Jamie

     

    Last edit: Jamie 2014-10-26
  • Jamie

    Jamie - 2015-03-05

    Should I file this as a bug report?

     
  • Stefano Chizzolini

    Hi Jamie, could you please send me a sample PDF of yours so I can test your issue?

    thank you

     
  • Jamie

    Jamie - 2015-03-15

    Here are two files.
    "pagelabel test doc.pdf" The first two pages are 1 and 2 with the page label prefix of "cvr". The next 4 pages are i-iv. The next 4 are 1-4. The last 2 are cvr3 - cvr4.

    "page 3 of 4 page file.pdf" is a 1 page file with a page label start of page 3.

    Thanks for your help.

     
  • Jamie

    Jamie - 2015-04-15

    Stefano, have you had a chance to look at this issue with the test files I supplied? I can supply more files if you need them.

    Thanks.

     
  • Stefano Chizzolini

    Hi Jamie,
    I'm really sorry but, as you can see from the commit log, several long-due issues have been solved in the meantime -- yours is among the next in line, I promise to give you an answer within a few days.

     
  • Jimmy

    Jimmy - 2015-04-16

    Yes, this is also an issue for me. Looks like the tree constructor is trying to get reference to node PdfName.Names when only PdfName.Nums exist.

    I managed to get around this by changing Tree.cs #region constructor to:

    ~~~~~
    #region constructors
    internal Enumerator(
    Tree<TKey,TValue> tree
    )
    {
    this.tree = tree;

        container = tree.Container;
        PdfDictionary rootNode = tree.BaseDataObject;
        PdfDirectObject kidsObject =  rootNode[PdfName.Kids];
    
        if(kidsObject == null) // Leaf node.
        {
          PdfDirectObject namesObject = rootNode[PdfName.Names];
          //added this block
          if (namesObject == null)
          {
              namesObject = rootNode[PdfName.Nums];
          }
          //end addition
    
          if (namesObject is PdfReference)
          {
              container = ((PdfReference)namesObject).IndirectObject; 
          }
          names = (PdfArray)namesObject.Resolve();
        }
        else // Intermediate node.
        {
          if(kidsObject is PdfReference)
          {container = ((PdfReference)kidsObject).IndirectObject;}
          kids = (PdfArray)kidsObject.Resolve();
        }
      }
      #endregion
    

    ~~~~

     

    Last edit: Jimmy 2015-04-16
  • Jamie

    Jamie - 2015-04-17

    Thanks so much. I'll try the fix.

     

Log in to post a comment.