#1 Program dump on large dataset

open
nobody
None
5
2014-08-09
2008-02-21
No

I'm attempting to train the CRF++ program on a large dataset (>10M words), and it's dying every time the memory limitation exceeds 4 GB. It is compiled and running on a Solaris 5.9 system with 24 GB of memory, but each process of crf_learn is limited to 4 GB in size. Is this a limitation of Solaris or is it a pointer type/int overflow issue for CRF? As soon as it reaches 4 GB in memory, the program terminates with the single word "Abort."

Discussion

  • Logged In: YES
    user_id=776341
    Originator: YES

    I recompiled the executables on a system which runs FC5 x86_64 and it was able to handle an in-memory size of 6.4 GB without a problem. I don't know if the 32-bit limit is surmountable within CRF++ or not, or if it would require massive changes to the library. However, the problem *is* confirmed to be an issue on 32-bit systems.