I want to train lemmatization models for some highly inflected languages (Dutch, German and French). Therefore I have a large number of possible classifications (1000-2000). This results in a segmentation fault after 7 iterations or so. I checked my training data (consisting of about 100.000 words for each language) and there is nothing wrong with it. So it seems there is a memory usage problem, although I'm using a 64GB device. Has anyone encountered this problem before? I also tried using even and uneven numbers of threads, but this didn't change anything. What can I do about it? It works when I use a smaller number of training features, but this results in a lower lemmatization accuracy.
Log in to post a comment.