I am using latest sphinxbase on Fedora 16 64 bit. When i tried to convert arpa
to DMP file i got the following error. Can you please let me know what may be
the cause for this error
INFO: ngram_model_arpa.c(534): 54584 = #prob2 entries
INFO: ngram_model_arpa.c(542): 12336 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
.ERROR: "ngram_model_arpa.c", line 396: Size of trigram segment is bigger than
65535, such a big language models are not supported, use smaller vocabulary
ERROR: "ngram_model_dmp.c", line 121: Wrong magic header size number 5c646174:
sorted is not a dump file
FATAL_ERROR: "sphinx_lm_convert.c", line 170: Failed to read the model from
the file 'sorted'
Thanking you sir
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Till recent days, I am able to convert models bigger than the above model to
DMP. Recently, one week back I upgraded to Fedora 17 beta and installed new
sphinxbase from snapshot. From then it started giving this error. That is why
I am wondering. Earlier I was able to convert models with the ngrams 1=200000,
2=8996255, 3=24379143 some thing like this.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear Sir,
I am using latest sphinxbase on Fedora 16 64 bit. When i tried to convert arpa
to DMP file i got the following error. Can you please let me know what may be
the cause for this error
INFO: ngram_model_arpa.c(477): ngrams 1=100001, 2=7196255, 3=10921486
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(516): 100001 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
..............................................................................
...............................INFO: ngram_model_arpa.c(533): 7196255 =
bigrams created
INFO: ngram_model_arpa.c(534): 54584 = #prob2 entries
INFO: ngram_model_arpa.c(542): 12336 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
.ERROR: "ngram_model_arpa.c", line 396: Size of trigram segment is bigger than
65535, such a big language models are not supported, use smaller vocabulary
ERROR: "ngram_model_dmp.c", line 121: Wrong magic header size number 5c646174:
sorted is not a dump file
FATAL_ERROR: "sphinx_lm_convert.c", line 170: Failed to read the model from
the file 'sorted'
Thanking you sir
It writes you in plain English, please read bofore posting.
Size of trigram segment is bigger than 65535, such a big language models are not supported, use smaller vocabulary
Dear Sir,
Till recent days, I am able to convert models bigger than the above model to
DMP. Recently, one week back I upgraded to Fedora 17 beta and installed new
sphinxbase from snapshot. From then it started giving this error. That is why
I am wondering. Earlier I was able to convert models with the ngrams 1=200000,
2=8996255, 3=24379143 some thing like this.
I think this topic also saying some problem introduced in sphinxbase DMP
conversion
https://sourceforge.net/projects/cmusphinx/forums/forum/5471/topic/5230022
This was the out put of earlier converted DMP to arpa conversion. It could
read models with 1= 100001, 2=8434032, 3=20227218.
Sorry If I am wrong but I doubt is there any bug recently introduced in
sphinxbase I doubt?
$ sphinx_lm_convert -i telwordmodel1l-without.DMP -o junk.arpa
INFO: cmd_ln.c(691): Parsing command line:
sphinx_lm_convert \
-i telwordmodel1l-without.DMP \
-o junk.arpa
Current configuration:
-case
-debug 0
-help no no
-i telwordmodel1l-without.DMP
-ienc
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o junk.arpa
-oenc utf8 utf8
-ofmt
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(196): ngrams 1=100001, 2=8434032, 3=20227218
INFO: ngram_model_dmp.c(242): 100001 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(291): 8434032 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(317): 20227218 = LM.trigrams read
INFO: ngram_model_dmp.c(342): 51816 = LM.prob2 entries read
INFO: ngram_model_dmp.c(362): 8629 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(382): 53247 = LM.prob3 entries read
INFO: ngram_model_dmp.c(410): 16473 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(466): 100001 = ascii word strings read
dmp.c(410): 16473 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(466): 100001 = ascii word strings read
again I am sorry if i am not sensible.