Menu

#535 problem text extraction in some pdfs

closed-out-of-date
5
2010-04-07
2009-07-28
JM2006
No

Hi,

I am trying to extract text from a pdf (I am attaching the pdf) using the extracttext command line utility (and also pdfbox library) and I obtained next error:

C:\PDFBox-0.7.3\bin>extracttext pdfnotopen.pdf
Exception in thread "main" java.io.FileNotFoundException: No se pudo encontrar e
l archivo 'C:\ProgramasComprimidos\PdfBox0.7.3\PDFBox-0.7.3\PDFBox-0.7.3\bin\pdf
notopen.pdf'.
at gnu.java.nio.channels.FileChannelImpl.open(FileChannelImpl.java:327)
at gnu.java.nio.channels.FileChannelImpl.<init>(FileChannelImpl.java:225
)
at gnu.java.nio.channels.FileChannelImpl$Win32.<init>(FileChannelImpl.ja
va:105)
at gnu.java.nio.channels.FileChannelImpl.create(FileChannelImpl.java:219
)
at java.io.FileInputStream.<init>(FileInputStream.java:110)
at java.io.FileInputStream.<init>(FileInputStream.java:84)
at org.pdfbox.pdmodel.PDDocument.load(PDDocument.java:633)
at org.pdfbox.ExtractText.main(ExtractText.java:194)

Thanks to any help.
Best,
James

Discussion

  • JM2006

    JM2006 - 2009-07-28
     
  • Ben Litchfield

    Ben Litchfield - 2010-04-07
    • status: open --> closed-out-of-date
     
  • Ben Litchfield

    Ben Litchfield - 2010-04-07

    PDFBox has moved to Apache. Bugs have been moved over to the Apache bug tracking system. If you don't see the bug and it's still not fixed in the current release then please create a new bug on the Apache site.

    http://pdfbox.apache.org

     

Log in to post a comment.