Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#544 PDColorspaceFactory does not recognize colorspace DeviceGray

open
5
2011-03-15
2011-03-15
Matt England
No

I was trying to use PDFTextStripper to extract text from a large corpus of PDF files. In some of them, the method:

org.apache.pdfbox.pdmodel.graphics.color.PDColorSpaceFactory.createColorSpace( COSBase colorSpace, Map colorSpaces )

fails to recognize the case when the colorSpace argument is of type COSArray and the array's (first) element corresponds to COSName.DEVICEGRAY. Adding that case successfully parses the files that failed with the stock pdfbox-1.5.0. Attached is a diff of my patched PDColorSpaceFactory that handles the case where the colorspace name is DeviceGray. Incidentally, it occurs to me that another (possibly better) approach is to call through to createColorSpace(String) when no other case matches.

% diff PDColorSpaceFactory.java.orig PDColorSpaceFactory.java
94a95,97
> else if ( type.getName().equals( PDDeviceGray.NAME) ) {
> retval = new PDDeviceGray();
> }

Discussion

  • Matt England
    Matt England
    2011-03-15

    Diff of PDColorSpaceFactory.java

     
  • Matt England
    Matt England
    2011-03-15

    Just realized you're using apache to track bugs now. Will cross-post this there.