Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo
Some brief notes about the current state of PDF Clown project:
Next update (0.0.2), currently about to be released, will implement
some crucial scheduled features (content stream filtering, parsing and
building), along with the usual, merciless, iterative improvement of
If you have any specific TODO requests, please lemme know!
Does the project currently include identifying the vector elements used within a PDF file?
PDF Clown development roadmap is conceived to progressively implement the PDF 1.6 spec from the ground up: current release (0.0.2) supports content stream parsing at *operator level*.
Next releases (after incoming 0.0.3, which is primarily focused on OpenType/TrueType/Standard Type 1 font support and typographic alignment) will push the implementation to *graphics object level*.
To be plain and simple: PDF's graphics modelling is roughly stacked (from lower to higher level) this way:
- graphics operators (may encompass multiple operands)
- graphics objects (may encompass multiple operators)
So at the moment, you ask, does PDF Clown Project include identifying the [graphic] vector elements used within a PDF file?
Yes and no: 'yes', because you can easily retrieve the proper operators for path objects etc.; 'no', because identifying the proper operators for path objects etc. is currently upon you (PDF Clown 0.0.2 just parses the operators, without further semantic aggregation).
I hope my answer may be useful.
Thank you Daniel for your question.
I would like to know how is PDF Clown parsing the content stream.
I want to print the tokens and their values on the console. Can you send me test code.
content stream parsing is accomplished by it.stefanochizzolini.clown.documents.contents.tokens.Parser
Inside the 0.0.3 distribution you can find the ParsingSample class which just implements what you want; see the included documentation for further details.