I’m new to PDFClown and I’m having trouble getting the pagelabel dictionary out of an existing file. I started with some of the code in the pagelabel example, converted it to VB.net, and I get a null reference error in the line that iterates through the pagelabel key/value pairs. I copied the very similar code from the parse example that gets the meta data and it works fine. Getting the number of pages also works. Am I misunderstanding something about how this needs to be done?
Here is what I have. I’m working from memory here, I don’t have my code in front of me.
Public Class PageLabelSample
Inherits Sample
Public Sub Run()
' 1. Opening the PDF file...
Dim filePath As String = PromptFileChoice("Please select a PDF file")
Using myfile As New File(filePath)
Dim mydocument As Document = myfile.Document
' 2. Defining the page labels...
Dim mypageLabels As PageLabels = mydocument.PageLabels
' The next line generates the error
For Each entry As KeyValuePair(Of PdfInteger, PageLabel) In mypageLabels
Console.WriteLine("Page label " + entry.Value.BaseObject)
Console.WriteLine(" Initial page: " + (entry.Key.IntValue + 1))
Console.WriteLine(" Prefix: " + (entry.Value.Prefix))
Console.WriteLine(" Number style: " + (entry.Value.NumberStyle))
Console.WriteLine(" Number base: " + (entry.Value.NumberBase))
End Using
End Sub
End Class
:::
When the error happens, and I look at the value of mypageLabels in the debugger, it seems like it has a lot of unassigned or empty parts. I know that the PDF I’m using has a page label dictionary.
I saw the clone and wrap methods in the api help, but I don’t know if they are applicable.
Thanks for your help.
Jamie
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, I downloaded a fresh copy of the files (vers 0.1.2-Beta) and tried the sample files without any modification in both java and .net. The java version of the PageLabelSample works correctly. The .net version does not.
It produced the following error:
...
[0] Standard serialization
[1] Incremental update
Please select a serialization mode: 0
Output: ../../output/PageLabelSample.Standard.pdf
An exception happened while running the sample:
System.NullReferenceException: Object reference not set to an instance of an object
at org.pdfclown.objects.Tree`2+Enumerator[org.pdfclown.objects.PdfInteger,org.pdfclown.documents.interaction.navigation.page.PageLabel]..ctor (org.pdfclown.objects.Tree`2 tree) [0x00072] in /Users/macuser/PDFClown/dotNET/pdfclown.lib/src/org/pdfclown/objects/Tree.cs:220
at org.pdfclown.objects.Tree`2[org.pdfclown.objects.PdfInteger,org.pdfclown.documents.interaction.navigation.page.PageLabel].GetEnumerator () [0x00002] in /Users/macuser/PDFClown/dotNET/pdfclown.lib/src/org/pdfclown/objects/Tree.cs:770
at org.pdfclown.samples.cli.PageLabelSample.Run () [0x000d7] in /Users/macuser/PDFClown/dotNET/pdfclown.samples.cli/src/org/pdfclown/samples/cli/PageLabelSample.cs:54
at org.pdfclown.samples.cli.SampleLoader.Run (System.String inputPath, System.String outputPath) [0x00155] in /Users/macuser/PDFClown/dotNET/pdfclown.samples.cli/src/org/pdfclown/samples/cli/SampleLoader.cs:123
The program does produce the correct output pdf with the page labels assigned. It then fails when trying to read the page labels out of the just written file.
It fails on line 54:
foreach(KeyValuePair<PdfInteger,PageLabel> entry in file.Document.PageLabels)
If I insert a break point before it crashes, and look at the values of the page label object I get this:
pageLabels {org.pdfclown.documents.PageLabels} org.pdfclown.documents.PageLabels
base {org.pdfclown.objects.NumberTree<org.pdfclown.documents.interaction.navigation.page.PageLabel>} org.pdfclown.objects.NumberTree<org.pdfclown.documents.interaction.navigation.page.PageLabel>
base {org.pdfclown.objects.Tree<org.pdfclown.objects.PdfInteger,org.pdfclown.documents.interaction.navigation.page.PageLabel>} org.pdfclown.objects.Tree<org.pdfclown.objects.PdfInteger,org.pdfclown.documents.interaction.navigation.page.PageLabel>
base {org.pdfclown.objects.PdfObjectWrapper<org.pdfclown.objects.PdfDictionary>} org.pdfclown.objects.PdfObjectWrapper<org.pdfclown.objects.PdfDictionary>
base {org.pdfclown.objects.PdfObjectWrapper} org.pdfclown.objects.PdfObjectWrapper
BaseObject {<< Nums [ 0 324 0 R 3 325 0 R 6 326 0 R ] Limits [ 0 6 ] Metadata 0 0 R >>} org.pdfclown.objects.PdfDictionary
Container {322 0 obj << Type Catalog Pages 321 0 R PTEX.Fullbanner (This is pdfTeX, Version 3.141592-1.10b) PageLabels << Nums [ 0 324 0 R 3 325 0 R 6 326 0 R ] Limits [ 0 6 ] Metadata 0 0 R >> >>} org.pdfclown.objects.PdfIndirectObject
DataContainer {322 0 obj << Type Catalog Pages 321 0 R PTEX.Fullbanner (This is pdfTeX, Version 3.141592-1.10b) PageLabels << Nums [ 0 324 0 R 3 325 0 R 6 326 0 R ] Limits [ 0 6 ] Metadata 0 0 R >> >>} org.pdfclown.objects.PdfIndirectObject
Document {org.pdfclown.documents.Document} org.pdfclown.documents.Document
File {org.pdfclown.files.File} org.pdfclown.files.File
Non-public members
BaseDataObject {<< Nums [ 0 324 0 R 3 325 0 R 6 326 0 R ] Limits [ 0 6 ] >>} org.pdfclown.objects.PdfDictionary
Metadata {org.pdfclown.documents.interchange.metadata.Metadata} org.pdfclown.documents.interchange.metadata.Metadata
Non-public members
Count System.NullReferenceException: Object reference not set to an instance of an object
IsReadOnly false bool
Keys System.NullReferenceException: Object reference not set to an instance of an object
Values System.NullReferenceException: Object reference not set to an instance of an object
You can see that the container is the document catalog which contains the pagelabel dictionary. The Pagelabel dictionary refers to 3 other objects (324, 325, and 326) which appear to be valid page label entries.
However, you can also see that the count, keys, and values are all undefined. Does this indicate that the code that parses the page label dictionary is not working correctly?
I hope I have provided enough information to help track this down.
Thanks for your help.
Jamie
Last edit: Jamie 2014-10-26
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Here are two files.
"pagelabel test doc.pdf" The first two pages are 1 and 2 with the page label prefix of "cvr". The next 4 pages are i-iv. The next 4 are 1-4. The last 2 are cvr3 - cvr4.
"page 3 of 4 page file.pdf" is a 1 page file with a page label start of page 3.
Hi Jamie,
I'm really sorry but, as you can see from the commit log, several long-due issues have been solved in the meantime -- yours is among the next in line, I promise to give you an answer within a few days.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I’m new to PDFClown and I’m having trouble getting the pagelabel dictionary out of an existing file. I started with some of the code in the pagelabel example, converted it to VB.net, and I get a null reference error in the line that iterates through the pagelabel key/value pairs. I copied the very similar code from the parse example that gets the meta data and it works fine. Getting the number of pages also works. Am I misunderstanding something about how this needs to be done?
Here is what I have. I’m working from memory here, I don’t have my code in front of me.
:::Visual Basic.NET
Imports org.pdfclown.documents
Imports org.pdfclown.documents.contents
Imports org.pdfclown.documents.contents.composition
Imports org.pdfclown.documents.contents.objects
Imports org.pdfclown.documents.interaction.navigation.page
Imports org.pdfclown.files
Imports org.pdfclown.objects
Imports System.Collections.Generic
Imports System.Drawing
:::
When the error happens, and I look at the value of mypageLabels in the debugger, it seems like it has a lot of unassigned or empty parts. I know that the PDF I’m using has a page label dictionary.
I saw the clone and wrap methods in the api help, but I don’t know if they are applicable.
Thanks for your help.
Jamie
Well, I downloaded a fresh copy of the files (vers 0.1.2-Beta) and tried the sample files without any modification in both java and .net. The java version of the PageLabelSample works correctly. The .net version does not.
It produced the following error:
The program does produce the correct output pdf with the page labels assigned. It then fails when trying to read the page labels out of the just written file.
It fails on line 54:
foreach(KeyValuePair<PdfInteger,PageLabel> entry in file.Document.PageLabels)
If I insert a break point before it crashes, and look at the values of the page label object I get this:
You can see that the container is the document catalog which contains the pagelabel dictionary. The Pagelabel dictionary refers to 3 other objects (324, 325, and 326) which appear to be valid page label entries.
However, you can also see that the count, keys, and values are all undefined. Does this indicate that the code that parses the page label dictionary is not working correctly?
I hope I have provided enough information to help track this down.
Thanks for your help.
Jamie
Last edit: Jamie 2014-10-26
Should I file this as a bug report?
Hi Jamie, could you please send me a sample PDF of yours so I can test your issue?
thank you
Here are two files.
"pagelabel test doc.pdf" The first two pages are 1 and 2 with the page label prefix of "cvr". The next 4 pages are i-iv. The next 4 are 1-4. The last 2 are cvr3 - cvr4.
"page 3 of 4 page file.pdf" is a 1 page file with a page label start of page 3.
Thanks for your help.
Stefano, have you had a chance to look at this issue with the test files I supplied? I can supply more files if you need them.
Thanks.
Hi Jamie,
I'm really sorry but, as you can see from the commit log, several long-due issues have been solved in the meantime -- yours is among the next in line, I promise to give you an answer within a few days.
Yes, this is also an issue for me. Looks like the tree constructor is trying to get reference to node PdfName.Names when only PdfName.Nums exist.
I managed to get around this by changing Tree.cs #region constructor to:
~~~~~
#region constructors
internal Enumerator(
Tree<TKey,TValue> tree
)
{
this.tree = tree;
~~~~
Last edit: Jimmy 2015-04-16
Hi Jamie and Jimmy,
I verified that your issue was fixed on 2014-08-14 on the 0.1.2-Fix branch of the SVN repo (rev 122 "[FIX:56] Tree pairs-key correction"). Jamie, I tested your files against 0.1.2-Fix branch and they worked correctly as expected.
I recommend you to switch to the 0.1.2-Fix branch until PDF Clown 0.1.2.1 is released, thank you -- here it is its snapshot: https://sourceforge.net/p/clown/code/HEAD/tarball?path=/branches/0.1.2-Fix
Thanks so much. I'll try the fix.