NAPS2 - Not Another PDF Scanner / Tickets / #381 NAPS2 fails to import large external PDF

Tony Jones - 2017-12-06

At a minimum it needs an improved diagnostic. It's occuring with two large documents but I have no idea if it's the same underlying issue. I'm not sure on distribution rights on one document so attaching just the other. it's too large to attach directly so here is Google Docs link. Thanks for all your work on this great product.

https://drive.google.com/file/d/1a_AFFuEJAFjGs4iaKMknfB7biDAytJQJ/view?usp=sharing

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ben Olden-Cooligan - 2017-12-06

A more detailed error should be visible in your errorlog.txt file. You can find it in the "%APPDATA%\NAPS2" folder. For example, if your user name is Tony, it would be in "C:\Users\Tony\AppData\Roaming\NAPS2".

I'll look at the file later.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tony Jones - 2017-12-06

File1: Attachment in Google Drive. Opens fine in eVince and Adobe Acrobat.

2017-12-06 12:29:44.1561 Error importing PDF file. PdfSharp.Pdf.IO.PdfReaderException: Unexpected character '0x0023' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file.
at PdfSharp.Internal.ParserDiagnostics.HandleUnexpectedCharacter(Char ch)
at PdfSharp.Pdf.IO.Lexer.ScanNextToken()
at PdfSharp.Pdf.IO.Parser.ParseObject(Symbol stop)
at PdfSharp.Pdf.IO.Parser.ReadArray(PdfArray array, Boolean includeReferences)
at PdfSharp.Pdf.IO.Parser.ParseObject(Symbol stop)
at PdfSharp.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences)
at PdfSharp.Pdf.IO.Parser.ReadObject(PdfObject pdfObject, PdfObjectID objectID, Boolean includeReferences, Boolean fromObjecStream)
at PdfSharp.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider)
at PdfSharp.Pdf.IO.PdfReader.Open(String path, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider provider)
at NAPS2.ImportExport.Pdf.PdfSharpImporter.Import(String filePath, Func`3 progressCallback)

File2: Not sure of distribution rights on this file. I'd prefer to email it, not attach publicly. LMK if possible.

2017-12-06 12:33:05.4046 Error importing PDF file. System.NullReferenceException: Object reference not set to an instance of an object.
at PdfSharp.Pdf.PdfPages.GetKids(PdfReference iref, InheritedValues values, PdfDictionary parent)
at PdfSharp.Pdf.PdfPages.FlattenPageTree()
at PdfSharp.Pdf.Advanced.PdfCatalog.get_Pages()
at PdfSharp.Pdf.PdfDocument.get_Pages()
at NAPS2.ImportExport.Pdf.PdfSharpImporter.Import(String filePath, Func`3 progressCallback)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tony Jones - 2017-12-06

As an aside. Is there any possibility that NAPS2 could directly import PPM files?

pdfimages is able to extract the files (all PPM) from the above two pdfs, so if NAPS2 could directly import these files it would provide an alternative when the direct PDF import goes wrong (obviously conversion to JPG is problematic as it's lossy).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Tony Jones - 2017-12-06
  
  I opened a new ticket for the above.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Rodrigo Pozzebon - 2017-12-19

I have the same problem here.

2017-12-19 15:33:52.9163 Error importing PDF file. System.NullReferenceException: Referência de objeto não definida para uma instância de um objeto.
em PdfSharp.Pdf.PdfPages.GetKids(PdfReference iref, InheritedValues values, PdfDictionary parent)
em PdfSharp.Pdf.PdfPages.FlattenPageTree()
em PdfSharp.Pdf.Advanced.PdfCatalog.get_Pages()
em PdfSharp.Pdf.PdfDocument.get_Pages()
em NAPS2.ImportExport.Pdf.PdfSharpImporter.Import(String filePath, Func3 progressCallback) 2017-12-19 15:35:24.2426 Error importing PDF file. System.NullReferenceException: Referência de objeto não definida para uma instância de um objeto. em PdfSharp.Pdf.PdfPages.GetKids(PdfReference iref, InheritedValues values, PdfDictionary parent) em PdfSharp.Pdf.PdfPages.FlattenPageTree() em PdfSharp.Pdf.Advanced.PdfCatalog.get_Pages() em PdfSharp.Pdf.PdfDocument.get_Pages() em NAPS2.ImportExport.Pdf.PdfSharpImporter.Import(String filePath, Func3 progressCallback)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ben Olden-Cooligan - 2017-12-22

For the first issue ("unexpected character"), it looks like the PDF file is technically invalid. The program that created it didn't check for division by zero and put in some bad data. I'll see if I can make NAPS2 ignore the bad data (like I assume Adobe Reader does).

For the second issue ("null reference"), I'll need a sample PDF file. If you click my name on this site, you should see a "Send Message" button you can use to send me a private message with a link to the file.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Tony Jones - 2017-12-27
  
  I sent you a messga with link. LMK when you've downloaded so I can delete. Thanks for all the help!
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

NAPS2 fails to import large external PDF

Scan documents to PDF and other file types, as simply as possible.

Milestone

Searches

Help

#381 NAPS2 fails to import large external PDF

Discussion