    Chris Thielen - 2010-10-05


    I'm curious about the most recent entry on the blog.  Are you working on manipulation of documents without a fully-in-memory model, via streams?  If so, is the trunk repository available for testing?  I have a specific use case which would benefit greatly from stream processing. 

    My use case involves concatenating (and bookmarking) many PDF files (possibly thousands) together, then streaming the resulting PDF to a browser.  In-memory model works, but is obviously very memory intensive.  In my simple testing, generation of a 150mb PDF requires approximately 320mb of java heap, which really isn't bad for an in-memory model, but can make serving multiple concurrent requests problematic.

    Thanks for your effort, I'm really impressed with PDFClown so far!

    I'm perfectly aware of your concerns about the library's scalability in a server context: you're absolutely right. My primary design focus has been on offering a rich, flexible and consistent document object model; however, a pure in-memory representation has the obvious drawback of a larger footprint that isn't suitable for heavy concurrent computing.

    To cope with such a limitation I'm thinking about a hybrid solution which could integrate the current DOM with a stream manager deputed to progressively serialize and dispose indirect objects as soon as they are complete - users could choose, according to their requirements and coding strategies, to keep all the model in-memory, or to stream it both on reading and writing.

    By the way, the blog entry you referred to doesn't mention file serialization streaming: it's about cross-reference streams and object streams instead, which are a PDF 1.5 structure optimization for better data compression.

    Thank you

    Chris, thank you  for your contribution; I'll try it in the next few days.


