n-gram larger than 3 in sphinx3

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

n-gram larger than 3 in sphinx3

Forum: Help

Creator: asr2010

Created: 2010-04-13

Updated: 2012-09-22

asr2010 - 2010-04-13

Dear all,

While decoding using sphinx3, I always get an error while using an arpa file
of n-gram larger than 3.

I tried to use "sphinx3_lm_convert" but it didn't work as well with n-gram
larger than 3.

Everything works fine with bigrams or trigrams but more than that it doesn't
work.

The error I get in the log file while decoding says:
INFO: lm_3g_dmp.c(471): Bad magic number: 589505315(23232323), not an LM
dumpfile??
INFO: lm.c(617): In lm_read, LM is not a DMP file. Trying to read it as a txt
file
WARNING: "lm_3g.c", line 249: Unknown ngram (4)
WARNING: "lm_3g.c", line 842: Couldnt' read the ngram count
INFO: lm.c(636): Lm is both not DMP and TXT format
FATAL_ERROR: "lmset.c", line 295:
lm_read_advance(C:/Users/m/_ph/my_w/am_trainer/Espk/etc/eca.lm.p.arpa,
1.000000e-009, 2.000000e-001, 7.000000e-001 44 , Weighted Apply) failed
Tue Apr 13 18:46:16 2010

any suggestions?

Regards,

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-04-13

It's not recommended to use sphinx3 in application development. Use pocketsphinx instead of sphinx3

Search with 4-grams is not supported. Dump lattices and rescore them with high-order ngrams.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

asr2010 - 2010-04-14

Thank you nshmyrev for your reply.

Why is sphinx3 not encouraged? is it only for the speed and memory? or the
accuracy and capabilities as well?

Actually I am using sphinx3 for research purposes. I mainly use continuous HMM
and adaptation techniques like MAP and MLLR. are these techniques supported in
pocketsphinx as well?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-04-14

Why is sphinx3 not encouraged?

Mainly about user support. It's also about higher accuracy and capabilities.

like MAP and MLLR. are these techniques supported in pocketsphinx as well?

They have same level of support as in sphinx3.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

asr2010 - 2010-04-15

I tested pocketsphinx (nightly build 15.4.2010) versus sphinx3 with the same
decoder settings (lang weight=10, beam width =1e-080, wordbeam = 1e-048) and
with the same AN4 acoustic model (cont. 1000ts 8 gau.) I used the
decode\slave.pl script to decode 130 utterances of the AN4 corpus (as in the
tutorial)
The accuracy is much better using sphinx3 decoder but slower:
using s3: wer = 17.5%
using ps: wer=29.1%

could you please comment on this?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-04-15

1000 senones is overkill for an4. Try with 100, you'll get WER like 16.5%

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.