Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


[ANN] Project status on mid-December 2006

  • Some brief notes about the current state of PDF Clown project:

    Next update (0.0.2), currently about to be released, will implement
    some crucial scheduled features (content stream filtering, parsing and
    building), along with the usual, merciless, iterative improvement of
    existing functionalities.

    If you have any specific TODO requests, please lemme know!

    • Daniel Wilson
      Daniel Wilson

      Does the project currently include identifying the vector elements used within a PDF file?


    • PDF Clown development roadmap is conceived to progressively implement the PDF 1.6 spec from the ground up: current release (0.0.2) supports content stream parsing at *operator level*.
      Next releases (after incoming 0.0.3, which is primarily focused on OpenType/TrueType/Standard Type 1 font support and typographic alignment) will push the implementation to *graphics object level*.

      To be plain and simple: PDF's graphics modelling is roughly stacked (from lower to higher level) this way:
      - graphics operators (may encompass multiple operands)
      - graphics objects (may encompass multiple operators)

      So at the moment, you ask, does PDF Clown Project include identifying the [graphic] vector elements used within a PDF file?
      Yes and no: 'yes', because you can easily retrieve the proper operators for path objects etc.; 'no', because identifying the proper operators for path objects etc. is currently upon you (PDF Clown 0.0.2 just parses the operators, without further semantic aggregation).

      I hope my answer may be useful.

      Thank you Daniel for your question.

    • Hi Stefano,

      I would like to know how is PDF Clown parsing the content stream.
      I want to print the tokens and their values on the console. Can you send me test code.


    • Hi Steve,

      content stream parsing is accomplished by it.stefanochizzolini.clown.documents.contents.tokens.Parser

      Inside the 0.0.3 distribution you can find the ParsingSample class which just implements what you want; see the included documentation for further details.

      Thank you!