sphinx_lm_convert giving problem

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

sphinx_lm_convert giving problem

Forum: Help

Creator: vijayabharadwaj gsr

Created: 2012-05-01

Updated: 2012-09-22

vijayabharadwaj gsr - 2012-05-01

Dear Sir,

I am using latest sphinxbase on Fedora 16 64 bit. When i tried to convert arpa
to DMP file i got the following error. Can you please let me know what may be
the cause for this error

INFO: ngram_model_arpa.c(477): ngrams 1=100001, 2=7196255, 3=10921486
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(516): 100001 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
..............................................................................
...............................INFO: ngram_model_arpa.c(533): 7196255 =

bigrams created

INFO: ngram_model_arpa.c(534): 54584 = #prob2 entries
INFO: ngram_model_arpa.c(542): 12336 = #bo_wt2 entries
INFO: ngram_model_arpa.c(292): Reading trigrams
.ERROR: "ngram_model_arpa.c", line 396: Size of trigram segment is bigger than
65535, such a big language models are not supported, use smaller vocabulary
ERROR: "ngram_model_dmp.c", line 121: Wrong magic header size number 5c646174:
sorted is not a dump file
FATAL_ERROR: "sphinx_lm_convert.c", line 170: Failed to read the model from
the file 'sorted'

Thanking you sir

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2012-05-01

It writes you in plain English, please read bofore posting.

Size of trigram segment is bigger than 65535, such a big language models are not supported, use smaller vocabulary

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vijayabharadwaj gsr - 2012-05-01

Dear Sir,

Till recent days, I am able to convert models bigger than the above model to
DMP. Recently, one week back I upgraded to Fedora 17 beta and installed new
sphinxbase from snapshot. From then it started giving this error. That is why
I am wondering. Earlier I was able to convert models with the ngrams 1=200000,
2=8996255, 3=24379143 some thing like this.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vijayabharadwaj gsr - 2012-05-01

I think this topic also saying some problem introduced in sphinxbase DMP
conversion

https://sourceforge.net/projects/cmusphinx/forums/forum/5471/topic/5230022

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vijayabharadwaj gsr - 2012-05-01

This was the out put of earlier converted DMP to arpa conversion. It could
read models with 1= 100001, 2=8434032, 3=20227218.

Sorry If I am wrong but I doubt is there any bug recently introduced in
sphinxbase I doubt?

$ sphinx_lm_convert -i telwordmodel1l-without.DMP -o junk.arpa
INFO: cmd_ln.c(691): Parsing command line:
sphinx_lm_convert \
-i telwordmodel1l-without.DMP \
-o junk.arpa

Current configuration:

-case
-debug 0
-help no no
-i telwordmodel1l-without.DMP
-ienc
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o junk.arpa
-oenc utf8 utf8
-ofmt

INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(196): ngrams 1=100001, 2=8434032, 3=20227218
INFO: ngram_model_dmp.c(242): 100001 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(291): 8434032 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(317): 20227218 = LM.trigrams read
INFO: ngram_model_dmp.c(342): 51816 = LM.prob2 entries read
INFO: ngram_model_dmp.c(362): 8629 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(382): 53247 = LM.prob3 entries read
INFO: ngram_model_dmp.c(410): 16473 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(466): 100001 = ascii word strings read
dmp.c(410): 16473 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(466): 100001 = ascii word strings read

again I am sorry if i am not sensible.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.