Menu

sphinx_lm_convert fail to convert 1-gram LMs?

Help
Horia Cucu
2011-03-24
2012-09-22
  • Horia Cucu

    Horia Cucu - 2011-03-24

    Hi,

    I'm trying to create a very simple uni-gram LM with SRILM and then convert it
    to .dmp, but the sphinx_lm_convert tool fails...
    Here's a simple test case:

    1. the ARPA unigram LM created with SRILM (ana1Gram.lm):
      cucu@argos:~/projectHoriaRoot/languageModelling/anaLM$ cat ana1Gram.lm

    \data\
    ngram 1=5

    \1-grams:
    -0.552842
    -99
    -0.6197888 ana
    -0.552842 are
    -0.69897 mere

    \end\

    1. the execution of sphinx_lm_sort to sort the language model
      cucu@argos:~/projectHoriaRoot/project/spk01-20_rec00-04$ ./bin/sphinx_lm_sort
      ../../languageModelling/anaLM/ana1Gram.lm >
      ../../languageModelling/anaLM/ana1Gram.lm.sorted

    2. the sorted LM (ana1Gram.lm.sorted):
      cucu@argos:~/projectHoriaRoot/languageModelling/anaLM$ cat ana1Gram.lm.sorted

    \data\
    ngram 1=5

    \1-grams:
    -0.5528

    -99.0000
    -0.6198 ana
    -0.5528 are
    -0.6990 mere

    \end\

    1. the convertion to .dmp fails:
      cucu@argos:~/projectHoriaRoot/project/spk01-20_rec00-04$
      ./bin/sphinx3_lm_convert -i ../../languageModelling/anaLM/ana1Gram.lm.sorted
      -o ../../languageModelling/anaLM/ana1Gram.lm.dmp
      INFO: info.c(65): Host: 'argos'
      INFO: info.c(66): Directory:
      '/home/cucu/projectHoriaRoot/project/spk01-20_rec00-04'
      INFO: info.c(70):
      /home/cucu/projectHoriaRoot/project/spk01-20_rec00-04/bin/.libs/lt-
      sphinx3_lm_convert Compiled on: Mar 18 2011, AT: 10:24:16

    INFO: cmd_ln.c(510): Parsing command line:
    /home/cucu/projectHoriaRoot/project/spk01-20_rec00-04/bin/.libs/lt-
    sphinx3_lm_convert \
    -i ../../languageModelling/anaLM/ana1Gram.lm.sorted \
    -o ../../languageModelling/anaLM/ana1Gram.lm.dmp

    Current configuration:

    -debug 0
    -i ../../languageModelling/anaLM/ana1Gram.lm.sorted
    -ienc iso8859-1 iso8859-1
    -ifmt TXT TXT
    -logfn
    -o ../../languageModelling/anaLM/ana1Gram.lm.dmp
    -odir . .
    -oenc iso8859-1 iso8859-1
    -ofmt DMP DMP

    INFO: lm.c(606): LM read('../../languageModelling/anaLM/ana1Gram.lm.sorted',
    lw= 1.00, wip= 0.10, uw= 1.00)
    INFO: lm.c(608): Reading LM file
    ../../languageModelling/anaLM/ana1Gram.lm.sorted (LM name "default")
    INFO: lm_3g.c(831): Reading LM file
    ../../languageModelling/anaLM/ana1Gram.lm.sorted
    WARNING: "lm_3g.c", line 261: Bad or missing ngram count
    WARNING: "lm_3g.c", line 842: Couldnt' read the ngram count
    INFO: lm.c(658): LM is not in TXT format
    FATAL_ERROR: "main_lm_convert.c", line 183: Fail to read inputfn
    ../../languageModelling/anaLM/ana1Gram.lm.sorted in inputfmt TXT

    Is this a bug in sphinx_lm_convert tool ar am I doing something wrong?
    Note that for a tri-gram LM designed with SRILM the above steps work
    perfectly...

    Horia

     
  • Nickolay V. Shmyrev

    This bug was fixed in trunk, you need to download and compile snapshot.

     
  • Horia Cucu

    Horia Cucu - 2011-03-24

    Thank you!

    I've downloaded sphinxbase-snapshot compiled and installed it and the
    conversion operation ended successfully.

    Horia

     

Log in to post a comment.