fi letter issues

Help
Steven M
2009-07-27
2013-05-28
  • Steven M
    Steven M
    2009-07-27

    Hi, I've encountered an issue. It seems to be common in PDF documents for the letters "fi" to be a single block, this causes issues with the text extractor in that it does not know how to handle this correctly and outputs the incorrect text. Is this a known issue? Is this a simple bug or much more complex?
    Thanks,
    Steven

     
    • Steven M
      Steven M
      2009-07-27

      Forget/Delete this post it has already been answered. Sorry

       
    • Steven M
      Steven M
      2009-07-27

      I have incounted another issue that may be of interest. A PDF document that will not load due to some type of font error. If this looks to be of interest I'll upload the PDF I used. A snippet of the exception is as follows.

      27/07/2009 3:04:22 PM de.intarsys.pdf.font.PDFontType1 lookupBuiltinAFM
      WARNING: builtin font metrics 'Times-Roman' load error
      java.io.IOException: copying failed (null)
          at de.intarsys.tools.exception.ExceptionTools.createIOException(Unknown Source)
          at de.intarsys.tools.stream.StreamTools.copyStream(Unknown Source)
          at de.intarsys.tools.locator.CommonLocator.createTempFileLocator(Unknown Source)
          at de.intarsys.tools.locator.ClassResourceLocator.getRandomAccess(Unknown Source)
          at de.intarsys.cwt.font.afm.AFM.initializeFromLocator(Unknown Source)
          at de.intarsys.cwt.font.afm.AFM.createFromLocator(Unknown Source)
          at de.intarsys.pdf.font.PDFontType1.lookupBuiltinAFM(PDFontType1.java:261)
          at de.intarsys.pdf.font.PDFontType1.createBuiltinFontDescriptor(PDFontType1.java:299)
          at de.intarsys.pdf.font.PDFont.createFontDescriptor(PDFont.java:356)
          at de.intarsys.pdf.font.PDFont.getFontDescriptor(PDFont.java:452)
          at de.intarsys.pdf.font.PDFont.getMissingWidth(PDFont.java:554)
      ...

       
      • mtraut
        mtraut
        2009-07-31

        Sorry for the delay, been on the road.

        This does not seem document related, but please, upload the document anyway for testing.

        It seems to me that jPod is not able to load the resource "de.intarsys.pdf.font.Times-Roman.afm". Are you sure everything is in place?

         
        • Steven M
          Steven M
          2009-08-05

          Should there be a file called Times-Roman.afm in the actual package? because I am using the source package extracted out and there are no files other than .java source ones. I get the same error on this file but with "Symbol" Instead of times-roman. Here is the link https://www.your-fundaccount.com/rotary/DownloadCenter/Sample.pdf

           
          • Steven M
            Steven M
            2009-08-05

            I have fixed the issue it is due to the absence of the files. I don't think they are included in the source pakage of Jpod thus the issue but I will look now. Also you wouldn't happen to have a source for these files if they are not?
            Thanks
            Steven

             
            • mtraut
              mtraut
              2009-08-05

              the files you mention are packaged in the ready to run "jPod.jar" AND are contained in the source release in the "resource" branch of the zip file