Menu

add to en-us.lm.dmp language model?

Help
2015-04-13
2015-11-03
  • john absalon

    john absalon - 2015-04-13

    Is there some tool I can use to manually edit the .dmp language model? Or is there some tool I can use to add to the language model? More specifically I would like to combine my custom language model with the general en-us.lm.dmp language model

     

    Last edit: john absalon 2015-04-14
    • bic-user

      bic-user - 2015-04-14

      .dmp is a binary format of language model. To convert it to editable arpa format use

      sphinx_lm_convert -i model.dmp -o model.lm

      from sphinxbase.

       
  • Daniel Wolf

    Daniel Wolf - 2015-10-08

    I just downloaded the latest US English generic language model from the SourceForge site. I tried converting the .dmp file using the sphinx_lm_convert.exe tool from the latest pocketsphinx release (5prealpha from 2015-08-05). However, the tool didn't accept the binary format:

    sphinx_lm_convert.exe -i cmusphinx-5.0-en-us.lm.dmp -o cmusphinx-5.0-en-us.lm
    
    Current configuration:
    [NAME]          [DEFLT] [VALUE]
    -case
    -debug                  0
    -help           no      no
    -i                      cmusphinx-5.0-en-us.lm.dmp
    -ifmt
    -lm_trie        no      no
    -logbase        1.0001  1.000100e+00
    -mmap           no      no
    -o                      cmusphinx-5.0-en-us.lm
    -ofmt
    
    INFO: ngram_model_trie.c(456): Trying to read LM in trie binary format
    INFO: ngram_model_trie.c(467): Header doesn't match
    INFO: ngram_model_trie.c(189): Trying to read LM in arpa format
    INFO: ngram_model_trie.c(70): No \data\ mark in LM file
    INFO: ngram_model_trie.c(548): Trying to read LM in DMP format
    INFO: ngram_model_trie.c(630): ngrams 1=19794, 2=1377200, 3=3178194
    INFO: lm_trie.c(399): Training quantizer
    INFO: lm_trie.c(407): Building LM trie
    ERROR: "ngram_model.c", line 194: language model file type not supported
    ERROR: "sphinx_lm_convert.c", line 197: Failed to write language model in format (null) to cmusphinx-5.0-en-us.lm
    

    Does pocketsphinx use different file formats?

     
  • Pedropablo

    Pedropablo - 2015-11-03

    Hi! I just came with the same problem and manage to solve it ! Try this:

    sphinx_lm_convert -i yourfile.lm.dmp -o yourOutputFile.lm -ofmt arpa

    You can check the parameters and arguments if you type :

    sphinx_lm_convert -help

     

Log in to post a comment.