Program dump on large dataset
Brought to you by:
taku-ku
I'm attempting to train the CRF++ program on a large dataset (>10M words), and it's dying every time the memory limitation exceeds 4 GB. It is compiled and running on a Solaris 5.9 system with 24 GB of memory, but each process of crf_learn is limited to 4 GB in size. Is this a limitation of Solaris or is it a pointer type/int overflow issue for CRF? As soon as it reaches 4 GB in memory, the program terminates with the single word "Abort."
Logged In: YES
user_id=776341
Originator: YES
I recompiled the executables on a system which runs FC5 x86_64 and it was able to handle an in-memory size of 6.4 GB without a problem. I don't know if the 32-bit limit is surmountable within CRF++ or not, or if it would require massive changes to the library. However, the problem *is* confirmed to be an issue on 32-bit systems.