This is a feature request to reduce memory consumption when extracting text from PDF. Problem diagnostic and possible solution follow.
I'm using jPod to index PDF files, i.e. I need only text contents. However, memory consumption grows enormously on files that contain images or other drawing-like stuff (e.g. AutoCAD drawings). I was able to track the problem down to CSDeviceBasedInterpreter. Basically, its rendering operation methods always load resources, even if device (CSTextExtractor in my case, any CSTextDevice will do) does nothing with them.
One possible thing is to extend ICSDevice interface with methods like "bool isInlineImageImplemented()". If it returns false, render_EI() in CSDeviceBasedInterpreter does nothing and immediatly returns. Similar "implemented" methods could be added for other methods, at least for those that use expensive resources like doXObject().
BTW, thank you for the library!