Does anybody know hot to determine if a pdf file contains color or is black white ?
I would be greatfull.
To detect whether a PDF is monochromatic you have to analyze the instances of the color spaces declared within your file and associated either to vectorial (see content streams) or raster (see image bit streams) graphics.
If all the used color spaces feature just one color component (gray level) , you're ok; otherwise, in case of multiple components you have to verify that each single reference to that color space maps to monochrome. This implies you have (for vectorial graphics) to scan every content stream looking for color-setting operators in order to evaluate the values of their operands (which represent the used color space components) and (for raster graphics) to scan every image bit stream evaluating its sample data component values.
PDF Clown currently fully supports only device-based color spaces: CIE and special ones are just passed-through.
Color space types are gathered inside the it.stefanochizzolini.clown.documents.contents.colorSpaces package .
Content streams can be scanned through the ContentScanner class  available from the it.stefanochizzolini.clown.documents.contents package.
As images can be declared both inside (so-called "inline images") and outside (so-called "external images") content streams, their bit streams can be accordingly scanned through two buffer sources: for inline images, the getBody().getValue() method of the InlineImage class  available from the it.stefanochizzolini.clown.documents.contents.objects package; for external images, the getBaseDataObject().getBody() method of the ImageXObject class  available from the it.stefanochizzolini.clown.documents.contents.xObjects package.
The downloadable distribution of PDF Clown includes several working sample codes that demonstrate the use of such objects.
thanks again Stefano for your answer, I tried for the whole day to find a pattern for what I can say trully that a document contains color. I scanned the content stream for color setting operators and this can easily be done(G - DeviceGray, RG - DEvice RGB, K - CMYK. The issue is that I do not deal only with images, but all kind of xobjects.
for other people reading this topic :
raster image : http://www.wisegeek.com/what-is-a-raster-image.htm
vector image : http://www.wisegeek.com/what-is-the-difference-between-vector-and-bitmap-images.htm
Can you please tell me what approach should I use and write down a sample code for each:
-iterate though pages and playing with content streams / raster graphics (not all objects define a color space and not all color spaces (even from the ones derived from CIE ) have a fully implemented methods that I can use ( see return null statements)
- iterate through pdf objects( hard to get to an accurate answer)
?? : example :
if i iterate though pdf pages :
foreach(Page page in document.Pages)
Contents objContents = page.Contents;
ContentScanner objScanner = new ContentScanner(objConents);
ContentScanner.GraphicsState objGraphicsState = objScanner.State;
how come for this object objGraphicsState I get a different colorspace than the color space for a certain image(obtained by iterating through page objects) contained in the same page?
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.