Menu

#20 "bad allocation" error in cfm-train

1.0
open
nobody
None
2019-02-28
2019-02-28
No

Hi Felicity,

I'm getting persistent failure when trying to train a new model using the cfm-train.exe function running under mpi parallelization. The stdout is pasted below. The process stops during the fragmentation graph calculation process, and I notice that about 2/3 way through the set of input molecules, I start seeing error messages like:

Exception occurred computing fragment graph for PHVNLLCAQHGNKU
CC1=C(C)S(=O)(=O)CCS1(=O)=O
bad allocation

These get more and more frequent, until at the end (before the process terminates), essentially all of the input molecules give exceptions. Then the program ends without much more information. A quick Google search indicates this sounds like a memory allocation issue. Odd thing is: I have 64 GB of memory available on this workstation, and cfm-train never seems to use more than 30 GB even just before it fails. I'm running 10 (of 12 total available) processors on MPI, and I'm training on ~5,800 molecules.

Any idea what I might be doing wrong here?

Thanks,
Lee

C:\Users\pf31\Documents\cfmTrain>mpiexec -n 10 cfm-train.exe inputStructures_pos.txt feature_config.txt param_config.txt spectraPos 1 status.log
Initialising Feature Calculator..Done
Initialising Parameter Configuration..Using Combined Energy CFM
Positive ESI Ionization Mode
Not including fragment isotopes
Using Random Parameter Initialisation
Using EM Convergence Threshold 10
Using Lambda 1
Using LBFGS package for gradient ascent
Using GA max iterations 20
Using GA Convergence Threshold 1
Using GA mini batch taking 1 in 1 of processor data
Using Fragmentation Graph Depth 2
Allowing fragmentation detours
Maximum Ring Breaks 2
Using Model Depth 6
Using Spectrum Depths and Weights: (2,1) (4,1) (6,1)
Using Absolute mass tolerance 0.01
Using PPM mass tolerance 10
Using IPFP with Oscillatory Adjustment
Using IPFP Converge Thresh 0.005
Using IPFP Oscillatory Converge Thresh 0.999
Using linear function for theta
Using normally distributed observation function
Done
Parsing input file...Done

job aborted:
[ranks] message

[0-3] terminated

[4] process exited without calling finalize

[5-9] terminated

---- error analysis -----

[4] on FERGUSONMS1
cfm-train.exe ended prematurely and may have crashed. exit code -529697949

---- error analysis -----

Discussion


Log in to post a comment.

MongoDB Logo MongoDB