From: Kirill K. <kir...@sm...> - 2015-03-11 20:58:49
|
I am running quite out of RAM in arpa-to-const-arpa in librispeech/s5 for the 4-gram model. The input argument to arpa-to-const-arpa is the massaged data from data/local/lm/lm_fglarge.arpa.gz (61M 4-grams additional). The 3-gram file passed with ~16G of peak commit size. The 4-gram crashed with OOM overnight, running out of 32G available to it. What memory usage should I expect? On the windows port side: pipes fixed, build system works, and I have advanced as far as decoding the unigram GMM model through the recipe. The troubles I am getting are from Cygwin files. First, absolute paths do not work, as files in Cygwin are essentially chrooted to a virtual root path. Second of all, links do not work. I am tweaking the scripts so far to get past the problems, but there is a general solution to handle the paths in code. I want my experiments gone through first however, already spending a lot of time on the technical stuff. Progress is here <https://github.com/kkm000/kaldi/compare/winbuild>, but the history is messy, I'll itemize it. I fixed a weird bug that would not be caught with gcc because of constructor elision, and also supported WAVEFORMATEXTENSIBLE in wave files (my flac 1.3.1 sends this format to stdout). How can I send patches for these changes? Let's start with windows-unrelated patches now. -kkm |