When you report about problems please learn to provide the version of the
software you are using.
It looks like your language model is too big and not supported by sphinxbase.
There are either too many unigrams or too many trigrams. You can use smaller
vocabulary or prune trigrams with srilm (ngram -prune). You can also try DMP32
format supported by sphinx3 and sphinx3_lm_convert
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
i have tried to use sphinx_lm_convert tool to convert .arpa language to .dmp,
bu got following error messages:
sphinx_lm_convert -case upper -i de_wiki.arpa -ienc utf8 -ifmt arpa -mmap yes
-o de_wiki.dmp -oenc utf8
INFO: cmd_ln.c(512): Parsing command line:
sphinx_lm_convert \
-case upper \
-i de_wiki.arpa \
-ienc utf8 \
-ifmt arpa \
-mmap yes \
-o de_wiki.dmp \
-oenc utf8
Current configuration:
-case upper
-debug 0
-help no no
-i de_wiki.arpa
-ienc utf8
-ifmt arpa
-logbase 1.0001 1.000100e+00
-mmap no yes
-o de_wiki.dmp
-oenc utf8 utf8
-ofmt
INFO: ngram_model_arpa.c(477): ngrams 1=37333, 2=6966334, 3=25286752
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(516): 37333 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
..............................................................................
............................INFO: ngram_model_arpa.c(533): 6966334 = #bigrams
created
INFO: ngram_model_arpa.c(534): 33180 = #prob2 entries
INFO: ngram_model_arpa.c(542): 22420 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
.ERROR: "ngram_model_arpa.c", line 396: Offset from tseg_base > 65535
Segmentation fault
can anybody help me to fix it? thanks in advance!!
Ian
When you report about problems please learn to provide the version of the
software you are using.
It looks like your language model is too big and not supported by sphinxbase.
There are either too many unigrams or too many trigrams. You can use smaller
vocabulary or prune trigrams with srilm (ngram -prune). You can also try DMP32
format supported by sphinx3 and sphinx3_lm_convert