This patch is in regards to our short discussion here: https://sourceforge.net/projects/clown/forums/forum/607162/topic/3880971
I have implemented a proof of concept for disk-backed IBuffer(s). The implementation uses Java NIO MappedByteBuffer at its core. An overview of the changes is as follows:
1) New Disk backed IBuffer implementation
I created a new IBuffer implementation: NIOBuffer. Instead of a "byte data" member variable, I am using a java.io.MappedByteBuffer which is portion of a Memory Mapped File. The class was ported from byte array to using MappedByteBuffer. However, as a proof of concept, I have done no optimization and a lot of to/from byte array is still occurring which is relatively expensive. Also, there are now two IBuffer implementations, with a ton of code duplication. This duplicated functionality can probably be refactored by pulling it up into an abstract class which the two implementations can extend… or something.
2) New class which manages Buffer allocation for NIOBuffer
The MappedByteBuffers inside NIOBuffer objects are allocated using a manager class called FileMappedBufferManager. Each FMBM creates a temp file (default size a small 32KB) and splits the temp file up into MappedByteBuffer chunks, sequentially, as allocation requests are received. When the file size is not large enough, the size is doubled. There is currently no provision for returning allocated buffers to a pool. As such, many operations in NIOBuffer are extremely wasteful, especially operations such as appending single bytes to the buffer. Multiple FMBM may be created, each one will create its own temp file and serve allocation requests.
3) Context storing the desired IBuffer mode.
I chose to allow any IBuffer backing mode on demand, as opposed to an overall system setting. As such, I needed to associate a desired backing mode somewhere. The File object is where I'm storing the allocation context, in the form of an IBufferFactory. I have added new constructors to File which take the factory object as a parameter, or the default IBufferFactory type is pulled from a system property "DiskBackedBuffers". IBufferFactory is currently rather stupid, but it could be converted into an interface & dependency injection or otherwise made much cleaner. One problem is that not all current usages of "new Buffer()" have access to the File context. Where it was obvious (e.g., Parser.java), I added IBufferFactory to the constructor. Otherwise I left a few "raw" "new Buffer()" instantiations scattered throughout the code.
The FMBM has a dispose() method which cleans up the temporary file. This is in turn called by File.close().
Note, this is proof of concept code and I wanted to run it by you ASAP.
In my testing, I am now able to concatenate 1425 PDF files together, creating a 150mb, 2500 page document, with only 130MB of heap allocated to the VM. This is a marked improvement for my use case from the 300MB or more I required when the Buffers are in memory.
I hope to hear your thoughts on this! Perhaps you can use it, or perhaps it can be a stopgap measure until you figure out something better.
Chris Thielen - Mediture