OutOfMemoryException with Large files
Status: Beta
Brought to you by:
rsdio
Here is my code how i create the delta file for patching
two files:
Rdiff rdiff = new rDiff();
...
List sigs = rdiff.makeSignatures(InputStream);
List deltas = rdiff.makeDeltas(sigs, InputStream);
rdiff.writeDeltas(deltas, OutputStream);
When i use a file over a size of 450MB it reports back:
Exception in thread "main" java.lang.OutOfMemoryError:
Java heap space
Is this an environment setting problem or an actual
problem in the jarsync self. Or am i suppose to set some
settings before doing the patching.
Regards
Rossouw
Logged In: YES
user_id=322026
Originator: NO
(not sure if you're still interested, in the intervening years...)
There were some bugs in the ChecksumPair code, which prevented the hash search from working properly.
The other issue is that the list of signatures may be very large, as may be the deltas (the list of deltas may comprise the entire file, for example). It's much better to work with temporary files if you are processing very large files. Take a look at the methods
Rdiff.makeSignatures(InputStream, OutputStream);
Rdiff.makeDeltas(List<ChecksumPair>, InputStream, OutputStream);
These use the "streaming" API in a simple fashion, and should use little memory. You still need to keep the whole list of signatures in memory, of course, but you can mitigate the problems that causes by setting the 'blockSize' value to something larger for larger files.
(I suggest getting the newest code from Subversion, by the way)
The algorithm generates excessive GC due to:
1) Excessive array cloning,
2) excessive object and class constructions(i.e. event-objects, TwoKeys inner classes),
3) storing results in intermediate lists.
Using the -pipe option can only mitigate the 3rd problem.
To mitigate all problems would require a partial re-write.