While decoding using sphinx3, I always get an error while using an arpa file
of n-gram larger than 3.
I tried to use "sphinx3_lm_convert" but it didn't work as well with n-gram
larger than 3.
Everything works fine with bigrams or trigrams but more than that it doesn't
work.
The error I get in the log file while decoding says:
INFO: lm_3g_dmp.c(471): Bad magic number: 589505315(23232323), not an LM
dumpfile??
INFO: lm.c(617): In lm_read, LM is not a DMP file. Trying to read it as a txt
file
WARNING: "lm_3g.c", line 249: Unknown ngram (4)
WARNING: "lm_3g.c", line 842: Couldnt' read the ngram count
INFO: lm.c(636): Lm is both not DMP and TXT format
FATAL_ERROR: "lmset.c", line 295:
lm_read_advance(C:/Users/m/_ph/my_w/am_trainer/Espk/etc/eca.lm.p.arpa,
1.000000e-009, 2.000000e-001, 7.000000e-001 44 , Weighted Apply) failed
Tue Apr 13 18:46:16 2010
any suggestions?
Regards,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Why is sphinx3 not encouraged? is it only for the speed and memory? or the
accuracy and capabilities as well?
Actually I am using sphinx3 for research purposes. I mainly use continuous HMM
and adaptation techniques like MAP and MLLR. are these techniques supported in
pocketsphinx as well?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tested pocketsphinx (nightly build 15.4.2010) versus sphinx3 with the same
decoder settings (lang weight=10, beam width =1e-080, wordbeam = 1e-048) and
with the same AN4 acoustic model (cont. 1000ts 8 gau.) I used the
decode\slave.pl script to decode 130 utterances of the AN4 corpus (as in the
tutorial)
The accuracy is much better using sphinx3 decoder but slower:
using s3: wer = 17.5%
using ps: wer=29.1%
could you please comment on this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear all,
While decoding using sphinx3, I always get an error while using an arpa file
of n-gram larger than 3.
I tried to use "sphinx3_lm_convert" but it didn't work as well with n-gram
larger than 3.
Everything works fine with bigrams or trigrams but more than that it doesn't
work.
The error I get in the log file while decoding says:
INFO: lm_3g_dmp.c(471): Bad magic number: 589505315(23232323), not an LM
dumpfile??
INFO: lm.c(617): In lm_read, LM is not a DMP file. Trying to read it as a txt
file
WARNING: "lm_3g.c", line 249: Unknown ngram (4)
WARNING: "lm_3g.c", line 842: Couldnt' read the ngram count
INFO: lm.c(636): Lm is both not DMP and TXT format
FATAL_ERROR: "lmset.c", line 295:
lm_read_advance(C:/Users/m/_ph/my_w/am_trainer/Espk/etc/eca.lm.p.arpa,
1.000000e-009, 2.000000e-001, 7.000000e-001 44 , Weighted Apply) failed
Tue Apr 13 18:46:16 2010
any suggestions?
Regards,
Thank you nshmyrev for your reply.
Why is sphinx3 not encouraged? is it only for the speed and memory? or the
accuracy and capabilities as well?
Actually I am using sphinx3 for research purposes. I mainly use continuous HMM
and adaptation techniques like MAP and MLLR. are these techniques supported in
pocketsphinx as well?
Mainly about user support. It's also about higher accuracy and capabilities.
They have same level of support as in sphinx3.
I tested pocketsphinx (nightly build 15.4.2010) versus sphinx3 with the same
decoder settings (lang weight=10, beam width =1e-080, wordbeam = 1e-048) and
with the same AN4 acoustic model (cont. 1000ts 8 gau.) I used the
decode\slave.pl script to decode 130 utterances of the AN4 corpus (as in the
tutorial)
The accuracy is much better using sphinx3 decoder but slower:
using s3: wer = 17.5%
using ps: wer=29.1%
could you please comment on this?
1000 senones is overkill for an4. Try with 100, you'll get WER like 16.5%