Menu

problem by using sphinx_lm_convert

Help
2010-09-21
2012-09-22
  • haojin yang

    haojin yang - 2010-09-21

    i have tried to use sphinx_lm_convert tool to convert .arpa language to .dmp,
    bu got following error messages:

    sphinx_lm_convert -case upper -i de_wiki.arpa -ienc utf8 -ifmt arpa -mmap yes
    -o de_wiki.dmp -oenc utf8
    INFO: cmd_ln.c(512): Parsing command line:
    sphinx_lm_convert \
    -case upper \
    -i de_wiki.arpa \
    -ienc utf8 \
    -ifmt arpa \
    -mmap yes \
    -o de_wiki.dmp \
    -oenc utf8

    Current configuration:

    -case upper
    -debug 0
    -help no no
    -i de_wiki.arpa
    -ienc utf8
    -ifmt arpa
    -logbase 1.0001 1.000100e+00
    -mmap no yes
    -o de_wiki.dmp
    -oenc utf8 utf8
    -ofmt

    INFO: ngram_model_arpa.c(477): ngrams 1=37333, 2=6966334, 3=25286752
    INFO: ngram_model_arpa.c(135): Reading unigrams
    INFO: ngram_model_arpa.c(516): 37333 = #unigrams created
    INFO: ngram_model_arpa.c(195): Reading bigrams
    ..............................................................................
    ............................INFO: ngram_model_arpa.c(533): 6966334 = #bigrams
    created
    INFO: ngram_model_arpa.c(534): 33180 = #prob2 entries
    INFO: ngram_model_arpa.c(542): 22420 = #bo_wt2 entries
    INFO: ngram_model_arpa.c(292): Reading trigrams
    .ERROR: "ngram_model_arpa.c", line 396: Offset from tseg_base > 65535
    Segmentation fault

    can anybody help me to fix it? thanks in advance!!

    Ian

     
  • Nickolay V. Shmyrev

    When you report about problems please learn to provide the version of the
    software you are using.

    It looks like your language model is too big and not supported by sphinxbase.
    There are either too many unigrams or too many trigrams. You can use smaller
    vocabulary or prune trigrams with srilm (ngram -prune). You can also try DMP32
    format supported by sphinx3 and sphinx3_lm_convert

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.