Menu

Creating language model.

Help
Vel
2015-12-09
2016-05-03
  • Vel

    Vel - 2015-12-09

    Hello.
    I try to create a language model, but I have some problems. What I do step by step is decribed below.

    I downloaded cmuclmtk-0.7-win32.zip, extracted and created a .lm model using the following commands:

    text2wfreq < file_name.txt | wfreq2vocab > file_name.vocab
    
    text2idngram -vocab file_name.vocab -idngram file_name.idngram < file_name.txt
    idngram2lm -vocab_type 0 -idngram file_name.idngram -vocab file_name.vocab -arpa file_name.lm
    

    The .lm file is created, it is all good.

    After that, I downloaded sphinxbase-5prealpha-win32.zip file, extracted it and go to sphinxbase-5prealpha-win32\bin\Release this folder. I copied my .lm model in the folder and try to create binary format of my model using this command:

    sphinx_lm_convert -i file_name.lm -o file_name.lm.bin
    

    But I got the error!

    C:\...\sphinxbase-5prealpha-win32\bin\Release>sphinx_lm_convert -i file_name.lm -o file_name.lm.bin
    INFO: cmd_ln.c(697): Parsing command line:
    sphinx_lm_convert \
            -i file_name.lm \
            -o file_name.lm.bin
    
    Current configuration:
    [NAME]          [DEFLT] [VALUE]
    -case
    -debug                  0
    -help           no      no
    -i                      file_name.lm
    -ifmt
    -logbase        1.0001  1.000100e+000
    -mmap           no      no
    -o                      file_name.lm.bin
    -ofmt
    
    INFO: ngram_model_arpa.c(477): ngrams 1=4354, 2=8703, 3=13053
    INFO: ngram_model_arpa.c(135): Reading unigrams
    INFO: ngram_model_arpa.c(516):     4354 = #unigrams created
    INFO: ngram_model_arpa.c(195): Reading bigrams
    INFO: ngram_model_arpa.c(534):     8703 = #bigrams created
    INFO: ngram_model_arpa.c(535):        4 = #prob2 entries
    INFO: ngram_model_arpa.c(543):        4 = #bo_wt2 entries
    INFO: ngram_model_arpa.c(292): Reading trigrams
    INFO: ngram_model_arpa.c(556):    13053 = #trigrams created
    INFO: ngram_model_arpa.c(557):        3 = #prob3 entries
    ERROR: "ngram_model.c", line 181: language model file type not supported
    ERROR: "sphinx_lm_convert.c", line 192: Failed to write language model in format (null) to file_name.lm.bin
    

    Where is my mistake?

     

    Last edit: Vel 2015-12-10
  • Vel

    Vel - 2015-12-10

    Don't ignore this post please. Is it a bug or I have some mistakes?

     
    • Nickolay V. Shmyrev

      Precompiled binaries are too old and do not support bin trie format. You can compile latest sources yourself or wait till I upload updated binaries.

       
  • Nickolay V. Shmyrev

    I have updated the precompiled version, now it should work fine.

     
    • Snehal Patel

      Snehal Patel - 2016-05-03

      where can I get that precompiled version ?

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.