I'm trying to create a very simple uni-gram LM with SRILM and then convert it
to .dmp, but the sphinx_lm_convert tool fails...
Here's a simple test case:
the ARPA unigram LM created with SRILM (ana1Gram.lm):
cucu@argos:~/projectHoriaRoot/languageModelling/anaLM$ cat ana1Gram.lm
\data\
ngram 1=5
\1-grams:
-0.552842
-99
-0.6197888 ana
-0.552842 are
-0.69897 mere
\end\
the execution of sphinx_lm_sort to sort the language model
cucu@argos:~/projectHoriaRoot/project/spk01-20_rec00-04$ ./bin/sphinx_lm_sort
../../languageModelling/anaLM/ana1Gram.lm >
../../languageModelling/anaLM/ana1Gram.lm.sorted
the sorted LM (ana1Gram.lm.sorted):
cucu@argos:~/projectHoriaRoot/languageModelling/anaLM$ cat ana1Gram.lm.sorted
\data\
ngram 1=5
\1-grams:
-0.5528
-99.0000
-0.6198 ana
-0.5528 are
-0.6990 mere
\end\
the convertion to .dmp fails:
cucu@argos:~/projectHoriaRoot/project/spk01-20_rec00-04$
./bin/sphinx3_lm_convert -i ../../languageModelling/anaLM/ana1Gram.lm.sorted
-o ../../languageModelling/anaLM/ana1Gram.lm.dmp
INFO: info.c(65): Host: 'argos'
INFO: info.c(66): Directory:
'/home/cucu/projectHoriaRoot/project/spk01-20_rec00-04'
INFO: info.c(70):
/home/cucu/projectHoriaRoot/project/spk01-20_rec00-04/bin/.libs/lt-
sphinx3_lm_convert Compiled on: Mar 18 2011, AT: 10:24:16
INFO: lm.c(606): LM read('../../languageModelling/anaLM/ana1Gram.lm.sorted',
lw= 1.00, wip= 0.10, uw= 1.00)
INFO: lm.c(608): Reading LM file
../../languageModelling/anaLM/ana1Gram.lm.sorted (LM name "default")
INFO: lm_3g.c(831): Reading LM file
../../languageModelling/anaLM/ana1Gram.lm.sorted
WARNING: "lm_3g.c", line 261: Bad or missing ngram count
WARNING: "lm_3g.c", line 842: Couldnt' read the ngram count
INFO: lm.c(658): LM is not in TXT format
FATAL_ERROR: "main_lm_convert.c", line 183: Fail to read inputfn
../../languageModelling/anaLM/ana1Gram.lm.sorted in inputfmt TXT
Is this a bug in sphinx_lm_convert tool ar am I doing something wrong?
Note that for a tri-gram LM designed with SRILM the above steps work
perfectly...
Horia
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm trying to create a very simple uni-gram LM with SRILM and then convert it
to .dmp, but the sphinx_lm_convert tool fails...
Here's a simple test case:
cucu@argos:~/projectHoriaRoot/languageModelling/anaLM$ cat ana1Gram.lm
\data\
ngram 1=5
\1-grams:
-0.552842
-99
-0.6197888 ana
-0.552842 are
-0.69897 mere
\end\
the execution of sphinx_lm_sort to sort the language model
cucu@argos:~/projectHoriaRoot/project/spk01-20_rec00-04$ ./bin/sphinx_lm_sort
../../languageModelling/anaLM/ana1Gram.lm >
../../languageModelling/anaLM/ana1Gram.lm.sorted
the sorted LM (ana1Gram.lm.sorted):
cucu@argos:~/projectHoriaRoot/languageModelling/anaLM$ cat ana1Gram.lm.sorted
\data\
ngram 1=5
\1-grams:-0.5528
-99.0000
-0.6198 ana
-0.5528 are
-0.6990 mere
\end\
cucu@argos:~/projectHoriaRoot/project/spk01-20_rec00-04$
./bin/sphinx3_lm_convert -i ../../languageModelling/anaLM/ana1Gram.lm.sorted
-o ../../languageModelling/anaLM/ana1Gram.lm.dmp
INFO: info.c(65): Host: 'argos'
INFO: info.c(66): Directory:
'/home/cucu/projectHoriaRoot/project/spk01-20_rec00-04'
INFO: info.c(70):
/home/cucu/projectHoriaRoot/project/spk01-20_rec00-04/bin/.libs/lt-
sphinx3_lm_convert Compiled on: Mar 18 2011, AT: 10:24:16
INFO: cmd_ln.c(510): Parsing command line:
/home/cucu/projectHoriaRoot/project/spk01-20_rec00-04/bin/.libs/lt-
sphinx3_lm_convert \
-i ../../languageModelling/anaLM/ana1Gram.lm.sorted \
-o ../../languageModelling/anaLM/ana1Gram.lm.dmp
Current configuration:
-debug 0
-i ../../languageModelling/anaLM/ana1Gram.lm.sorted
-ienc iso8859-1 iso8859-1
-ifmt TXT TXT
-logfn
-o ../../languageModelling/anaLM/ana1Gram.lm.dmp
-odir . .
-oenc iso8859-1 iso8859-1
-ofmt DMP DMP
INFO: lm.c(606): LM read('../../languageModelling/anaLM/ana1Gram.lm.sorted',
lw= 1.00, wip= 0.10, uw= 1.00)
INFO: lm.c(608): Reading LM file
../../languageModelling/anaLM/ana1Gram.lm.sorted (LM name "default")
INFO: lm_3g.c(831): Reading LM file
../../languageModelling/anaLM/ana1Gram.lm.sorted
WARNING: "lm_3g.c", line 261: Bad or missing ngram count
WARNING: "lm_3g.c", line 842: Couldnt' read the ngram count
INFO: lm.c(658): LM is not in TXT format
FATAL_ERROR: "main_lm_convert.c", line 183: Fail to read inputfn
../../languageModelling/anaLM/ana1Gram.lm.sorted in inputfmt TXT
Is this a bug in sphinx_lm_convert tool ar am I doing something wrong?
Note that for a tri-gram LM designed with SRILM the above steps work
perfectly...
HoriaThis bug was fixed in trunk, you need to download and compile snapshot.
Thank you!
I've downloaded sphinxbase-snapshot compiled and installed it and the
conversion operation ended successfully.
Horia