Menu

sphinx_lm_convert giving problem

Help
2012-05-01
2012-09-22
  • vijayabharadwaj gsr

    Dear Sir,

    I am using latest sphinxbase on Fedora 16 64 bit. When i tried to convert arpa
    to DMP file i got the following error. Can you please let me know what may be
    the cause for this error

    INFO: ngram_model_arpa.c(477): ngrams 1=100001, 2=7196255, 3=10921486
    INFO: ngram_model_arpa.c(135): Reading unigrams
    INFO: ngram_model_arpa.c(516): 100001 = #unigrams created
    INFO: ngram_model_arpa.c(195): Reading bigrams
    ..............................................................................
    ...............................INFO: ngram_model_arpa.c(533): 7196255 =

    bigrams created

    INFO: ngram_model_arpa.c(534): 54584 = #prob2 entries
    INFO: ngram_model_arpa.c(542): 12336 = #bo_wt2 entries
    INFO: ngram_model_arpa.c(292): Reading trigrams
    .ERROR: "ngram_model_arpa.c", line 396: Size of trigram segment is bigger than
    65535, such a big language models are not supported, use smaller vocabulary
    ERROR: "ngram_model_dmp.c", line 121: Wrong magic header size number 5c646174:
    sorted is not a dump file
    FATAL_ERROR: "sphinx_lm_convert.c", line 170: Failed to read the model from
    the file 'sorted'

    Thanking you sir

     
  • Nickolay V. Shmyrev

    It writes you in plain English, please read bofore posting.

    Size of trigram segment is bigger than 65535, such a big language models are not supported, use smaller vocabulary

     
  • vijayabharadwaj gsr

    Dear Sir,

    Till recent days, I am able to convert models bigger than the above model to
    DMP. Recently, one week back I upgraded to Fedora 17 beta and installed new
    sphinxbase from snapshot. From then it started giving this error. That is why
    I am wondering. Earlier I was able to convert models with the ngrams 1=200000,
    2=8996255, 3=24379143 some thing like this.

     
  • vijayabharadwaj gsr

    This was the out put of earlier converted DMP to arpa conversion. It could
    read models with 1= 100001, 2=8434032, 3=20227218.

    Sorry If I am wrong but I doubt is there any bug recently introduced in
    sphinxbase I doubt?

    $ sphinx_lm_convert -i telwordmodel1l-without.DMP -o junk.arpa
    INFO: cmd_ln.c(691): Parsing command line:
    sphinx_lm_convert \
    -i telwordmodel1l-without.DMP \
    -o junk.arpa

    Current configuration:

    -case
    -debug 0
    -help no no
    -i telwordmodel1l-without.DMP
    -ienc
    -ifmt
    -logbase 1.0001 1.000100e+00
    -mmap no no
    -o junk.arpa
    -oenc utf8 utf8
    -ofmt

    INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
    INFO: ngram_model_dmp.c(196): ngrams 1=100001, 2=8434032, 3=20227218
    INFO: ngram_model_dmp.c(242): 100001 = LM.unigrams(+trailer) read
    INFO: ngram_model_dmp.c(291): 8434032 = LM.bigrams(+trailer) read
    INFO: ngram_model_dmp.c(317): 20227218 = LM.trigrams read
    INFO: ngram_model_dmp.c(342): 51816 = LM.prob2 entries read
    INFO: ngram_model_dmp.c(362): 8629 = LM.bo_wt2 entries read
    INFO: ngram_model_dmp.c(382): 53247 = LM.prob3 entries read
    INFO: ngram_model_dmp.c(410): 16473 = LM.tseg_base entries read
    INFO: ngram_model_dmp.c(466): 100001 = ascii word strings read
    dmp.c(410): 16473 = LM.tseg_base entries read
    INFO: ngram_model_dmp.c(466): 100001 = ascii word strings read

    again I am sorry if i am not sensible.

     

Log in to post a comment.