PDF Clown / Discussion / Help: Trouble reading pagelabel dictionary from existing PDF

Jamie - 2014-10-12

I’m new to PDFClown and I’m having trouble getting the pagelabel dictionary out of an existing file. I started with some of the code in the pagelabel example, converted it to VB.net, and I get a null reference error in the line that iterates through the pagelabel key/value pairs. I copied the very similar code from the parse example that gets the meta data and it works fine. Getting the number of pages also works. Am I misunderstanding something about how this needs to be done?

Here is what I have. I’m working from memory here, I don’t have my code in front of me.

:::Visual Basic.NET
Imports org.pdfclown.documents
Imports org.pdfclown.documents.contents
Imports org.pdfclown.documents.contents.composition
Imports org.pdfclown.documents.contents.objects
Imports org.pdfclown.documents.interaction.navigation.page
Imports org.pdfclown.files
Imports org.pdfclown.objects

Imports System.Collections.Generic
Imports System.Drawing

Public Class PageLabelSample Inherits Sample Public Sub Run() ' 1. Opening the PDF file... Dim filePath As String = PromptFileChoice("Please select a PDF file") Using myfile As New File(filePath) Dim mydocument As Document = myfile.Document ' 2. Defining the page labels... Dim mypageLabels As PageLabels = mydocument.PageLabels ' The next line generates the error For Each entry As KeyValuePair(Of PdfInteger, PageLabel) In mypageLabels Console.WriteLine("Page label " + entry.Value.BaseObject) Console.WriteLine(" Initial page: " + (entry.Key.IntValue + 1)) Console.WriteLine(" Prefix: " + (entry.Value.Prefix)) Console.WriteLine(" Number style: " + (entry.Value.NumberStyle)) Console.WriteLine(" Number base: " + (entry.Value.NumberBase)) End Using End Sub End Class

:::
When the error happens, and I look at the value of mypageLabels in the debugger, it seems like it has a lot of unassigned or empty parts. I know that the PDF I’m using has a page label dictionary.
I saw the clone and wrap methods in the api help, but I don’t know if they are applicable.

Thanks for your help.
Jamie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Well, I downloaded a fresh copy of the files (vers 0.1.2-Beta) and tried the sample files without any modification in both java and .net. The java version of the PageLabelSample works correctly. The .net version does not.

It produced the following error:

...
[0] Standard serialization
[1] Incremental update
Please select a serialization mode: 0

Output: ../../output/PageLabelSample.Standard.pdf
An exception happened while running the sample:
System.NullReferenceException: Object reference not set to an instance of an object
  at org.pdfclown.objects.Tree`2+Enumerator[org.pdfclown.objects.PdfInteger,org.pdfclown.documents.interaction.navigation.page.PageLabel]..ctor (org.pdfclown.objects.Tree`2 tree) [0x00072] in /Users/macuser/PDFClown/dotNET/pdfclown.lib/src/org/pdfclown/objects/Tree.cs:220 
  at org.pdfclown.objects.Tree`2[org.pdfclown.objects.PdfInteger,org.pdfclown.documents.interaction.navigation.page.PageLabel].GetEnumerator () [0x00002] in /Users/macuser/PDFClown/dotNET/pdfclown.lib/src/org/pdfclown/objects/Tree.cs:770 
  at org.pdfclown.samples.cli.PageLabelSample.Run () [0x000d7] in /Users/macuser/PDFClown/dotNET/pdfclown.samples.cli/src/org/pdfclown/samples/cli/PageLabelSample.cs:54 
  at org.pdfclown.samples.cli.SampleLoader.Run (System.String inputPath, System.String outputPath) [0x00155] in /Users/macuser/PDFClown/dotNET/pdfclown.samples.cli/src/org/pdfclown/samples/cli/SampleLoader.cs:123

The program does produce the correct output pdf with the page labels assigned. It then fails when trying to read the page labels out of the just written file.
It fails on line 54:
foreach(KeyValuePair<PdfInteger,PageLabel> entry in file.Document.PageLabels)

If I insert a break point before it crashes, and look at the values of the page label object I get this:

pageLabels          {org.pdfclown.documents.PageLabels}                                                                                                                                                         org.pdfclown.documents.PageLabels
base                {org.pdfclown.objects.NumberTree<org.pdfclown.documents.interaction.navigation.page.PageLabel>}                                                                                             org.pdfclown.objects.NumberTree<org.pdfclown.documents.interaction.navigation.page.PageLabel>
base                {org.pdfclown.objects.Tree<org.pdfclown.objects.PdfInteger,org.pdfclown.documents.interaction.navigation.page.PageLabel>}                                                                   org.pdfclown.objects.Tree<org.pdfclown.objects.PdfInteger,org.pdfclown.documents.interaction.navigation.page.PageLabel>
base                {org.pdfclown.objects.PdfObjectWrapper<org.pdfclown.objects.PdfDictionary>}                                                                                                                 org.pdfclown.objects.PdfObjectWrapper<org.pdfclown.objects.PdfDictionary>
base                {org.pdfclown.objects.PdfObjectWrapper}                                                                                                                                                     org.pdfclown.objects.PdfObjectWrapper
BaseObject          {<< Nums [ 0 324 0 R 3 325 0 R 6 326 0 R ] Limits [ 0 6 ] Metadata 0 0 R >>}                                                                                                                org.pdfclown.objects.PdfDictionary
Container           {322 0 obj << Type Catalog Pages 321 0 R PTEX.Fullbanner (This is pdfTeX, Version 3.141592-1.10b) PageLabels << Nums [ 0 324 0 R 3 325 0 R 6 326 0 R ] Limits [ 0 6 ] Metadata 0 0 R >> >>} org.pdfclown.objects.PdfIndirectObject
DataContainer       {322 0 obj << Type Catalog Pages 321 0 R PTEX.Fullbanner (This is pdfTeX, Version 3.141592-1.10b) PageLabels << Nums [ 0 324 0 R 3 325 0 R 6 326 0 R ] Limits [ 0 6 ] Metadata 0 0 R >> >>} org.pdfclown.objects.PdfIndirectObject
Document            {org.pdfclown.documents.Document}                                                                                                                                                           org.pdfclown.documents.Document
File                {org.pdfclown.files.File}                                                                                                                                                                   org.pdfclown.files.File
Non-public members                                                                                                                                                                                              
BaseDataObject      {<< Nums [ 0 324 0 R 3 325 0 R 6 326 0 R ] Limits [ 0 6 ] >>}                                                                                                                               org.pdfclown.objects.PdfDictionary
Metadata            {org.pdfclown.documents.interchange.metadata.Metadata}                                                                                                                                      org.pdfclown.documents.interchange.metadata.Metadata
Non-public members                                                                                                                                                                                              
Count               System.NullReferenceException: Object reference not set to an instance of an object                                                                                                         
IsReadOnly          false                                                                                                                                                                                       bool
Keys                System.NullReferenceException: Object reference not set to an instance of an object                                                                                                         
Values              System.NullReferenceException: Object reference not set to an instance of an object

You can see that the container is the document catalog which contains the pagelabel dictionary. The Pagelabel dictionary refers to 3 other objects (324, 325, and 326) which appear to be valid page label entries.

However, you can also see that the count, keys, and values are all undefined. Does this indicate that the code that parses the page label dictionary is not working correctly?

I hope I have provided enough information to help track this down.

Thanks for your help.
Jamie

Last edit: Jamie 2014-10-26

Jamie - 2015-03-05

Should I file this as a bug report?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stefano Chizzolini - 2015-03-08

Hi Jamie, could you please send me a sample PDF of yours so I can test your issue?

thank you

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jamie - 2015-03-15

Here are two files.
"pagelabel test doc.pdf" The first two pages are 1 and 2 with the page label prefix of "cvr". The next 4 pages are i-iv. The next 4 are 1-4. The last 2 are cvr3 - cvr4.

"page 3 of 4 page file.pdf" is a 1 page file with a page label start of page 3.

Thanks for your help.

page 3 of 4 page file.pdf

pagelabel test doc.pdf

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jamie - 2015-04-15

Stefano, have you had a chance to look at this issue with the test files I supplied? I can supply more files if you need them.

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Stefano Chizzolini - 2015-04-15

Hi Jamie,
I'm really sorry but, as you can see from the commit log, several long-due issues have been solved in the meantime -- yours is among the next in line, I promise to give you an answer within a few days.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Yes, this is also an issue for me. Looks like the tree constructor is trying to get reference to node PdfName.Names when only PdfName.Nums exist.

I managed to get around this by changing Tree.cs #region constructor to:

~~~~~
#region constructors
internal Enumerator(
Tree<TKey,TValue> tree
)
{
this.tree = tree;

    container = tree.Container;
    PdfDictionary rootNode = tree.BaseDataObject;
    PdfDirectObject kidsObject =  rootNode[PdfName.Kids];

    if(kidsObject == null) // Leaf node.
    {
      PdfDirectObject namesObject = rootNode[PdfName.Names];
      //added this block
      if (namesObject == null)
      {
          namesObject = rootNode[PdfName.Nums];
      }
      //end addition

      if (namesObject is PdfReference)
      {
          container = ((PdfReference)namesObject).IndirectObject; 
      }
      names = (PdfArray)namesObject.Resolve();
    }
    else // Intermediate node.
    {
      if(kidsObject is PdfReference)
      {container = ((PdfReference)kidsObject).IndirectObject;}
      kids = (PdfArray)kidsObject.Resolve();
    }
  }
  #endregion

~~~~

Last edit: Jimmy 2015-04-16

Stefano Chizzolini - 2015-04-17

Hi Jamie and Jimmy,
I verified that your issue was fixed on 2014-08-14 on the 0.1.2-Fix branch of the SVN repo (rev 122 "[FIX:56] Tree pairs-key correction"). Jamie, I tested your files against 0.1.2-Fix branch and they worked correctly as expected.

I recommend you to switch to the 0.1.2-Fix branch until PDF Clown 0.1.2.1 is released, thank you -- here it is its snapshot: https://sourceforge.net/p/clown/code/HEAD/tarball?path=/branches/0.1.2-Fix

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jamie - 2015-04-17

Thanks so much. I'll try the fix.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Trouble reading pagelabel dictionary from existing PDF

General-Purpose PDF Library for Java and .NET

Forums

Help

Trouble reading pagelabel dictionary from existing PDF

Trouble reading pagelabel dictionary from existing PDF

General-Purpose PDF Library for Java and .NET

Forums

Help

Trouble reading pagelabel dictionary from existing PDF document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Trouble reading pagelabel dictionary from existing PDF