Menu

n-gram larger than 3 in sphinx3

Help
asr2010
2010-04-13
2012-09-22
  • asr2010

    asr2010 - 2010-04-13

    Dear all,

    While decoding using sphinx3, I always get an error while using an arpa file
    of n-gram larger than 3.

    I tried to use "sphinx3_lm_convert" but it didn't work as well with n-gram
    larger than 3.

    Everything works fine with bigrams or trigrams but more than that it doesn't
    work.

    The error I get in the log file while decoding says:
    INFO: lm_3g_dmp.c(471): Bad magic number: 589505315(23232323), not an LM
    dumpfile??
    INFO: lm.c(617): In lm_read, LM is not a DMP file. Trying to read it as a txt
    file
    WARNING: "lm_3g.c", line 249: Unknown ngram (4)
    WARNING: "lm_3g.c", line 842: Couldnt' read the ngram count
    INFO: lm.c(636): Lm is both not DMP and TXT format
    FATAL_ERROR: "lmset.c", line 295:
    lm_read_advance(C:/Users/m/_ph/my_w/am_trainer/Espk/etc/eca.lm.p.arpa,
    1.000000e-009, 2.000000e-001, 7.000000e-001 44 , Weighted Apply) failed
    Tue Apr 13 18:46:16 2010

    any suggestions?

    Regards,

     
  • Nickolay V. Shmyrev

    1. It's not recommended to use sphinx3 in application development. Use pocketsphinx instead of sphinx3
    2. Search with 4-grams is not supported. Dump lattices and rescore them with high-order ngrams.
     
  • asr2010

    asr2010 - 2010-04-14

    Thank you nshmyrev for your reply.

    Why is sphinx3 not encouraged? is it only for the speed and memory? or the
    accuracy and capabilities as well?

    Actually I am using sphinx3 for research purposes. I mainly use continuous HMM
    and adaptation techniques like MAP and MLLR. are these techniques supported in
    pocketsphinx as well?

     
  • Nickolay V. Shmyrev

    Why is sphinx3 not encouraged?

    Mainly about user support. It's also about higher accuracy and capabilities.

    like MAP and MLLR. are these techniques supported in pocketsphinx as well?

    They have same level of support as in sphinx3.

     
  • asr2010

    asr2010 - 2010-04-15

    I tested pocketsphinx (nightly build 15.4.2010) versus sphinx3 with the same
    decoder settings (lang weight=10, beam width =1e-080, wordbeam = 1e-048) and
    with the same AN4 acoustic model (cont. 1000ts 8 gau.) I used the
    decode\slave.pl script to decode 130 utterances of the AN4 corpus (as in the
    tutorial)
    The accuracy is much better using sphinx3 decoder but slower:
    using s3: wer = 17.5%
    using ps: wer=29.1%

    could you please comment on this?

     
  • Nickolay V. Shmyrev

    1000 senones is overkill for an4. Try with 100, you'll get WER like 16.5%

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.