Menu

Self-trained model runs good in sphinxtrain-test but bad with Pockstsphinx and Sphinx4

Help
kyon
2017-11-03
2017-11-08
  • kyon

    kyon - 2017-11-03

    Hi, I am newbee to cmu-sphinx.

    I created the language model using: http://www.speech.cs.cmu.edu/tools/lmtool-new.html
    And trained the acoustic mode followed: https://cmusphinx.github.io/wiki/tutorialam/
    Using latest release from https://cmusphinx.github.io/wiki/download/

    The training data is very simple. They are all in the format as "Please take the characters <digits> once again <digits>".
    There are 2,000 wav files for training(5 hours in total) and 500 files for test, which cost me 2days to prepare.

    The training result is very good according to align file:

    TOTAL Words: 8256 Correct: 7278 Errors: 1105
    TOTAL Percent correct = 88.15% Error = 13.38% Accuracy = 86.62%
    TOTAL Insertions: 127 Deletions: 281 Substitutions: 697
    

    However, I got a very bad accuracy when running Pocketsphinx/Sphinx4 with my model.
    For example, file_2001.wav and file_2002.wav can be recognized in db.align file as:

    1 please take the characters one three eight seven two once again one three eight seven two  (apple-FILE_2001)
    2 please take the characters one three eight seven two once again one three eight seven two  (apple-FILE_2001)
    3 Words: 16 Correct: 16 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
    4 Insertions: 0 Deletions: 0 Substitutions: 0
    5 please take the characters one seven THREE nine one ONCE AGAIN one seven three nine ONE  (apple-FILE_2002)
    6 please take the characters one seven NINE  nine one ***  ***   one seven three nine ***  (apple-FILE_2002)
    7 Words: 16 Correct: 12 Errors: 4 Percent correct = 75.00% Error = 25.00% Accuracy = 75.00%
    

    Rather than pocketsphinx:

    file_2001.wav : NINE ZERO EIGHT NINE TWO NINE ONE
    file_2002.wav : NINE
    

    pocketsphinx pocommand is

    pocketsphinx_continuous -infile wav/file_2002.wav -hmm model_parameters/kyon2_db.cd_cont_32 -dict etc/kyon2_db.dic -lm etc/kyon2_db.lm.DMP
    

    Training db, including etc/model_parameters/result can be found : https://drive.google.com/open?id=0ByvxjxiH6xq3SEotczdNNVhacUU
    And I've uploaded 10 wav file for test.

    Do I miss something in training or using the model?

     

    Last edit: kyon 2017-11-03
  • kyon

    kyon - 2017-11-03

    Running with pocketsphinx_batch can get a very good result

    Command is :

    pocketsphinx_batch  -adcin yes  -cepdir wav  -cepext .wav  -ctl test.fileids  -lm etc/kyon2_db.lm.DMP -dict etc/kyon2_db.dic -hmm model_parameters/kyon2_db.cd_cont_32 -hyp test
    

    Result:

    INFO: batch.c(761): apple/file_2001: 8.23 seconds speech, 0.11 seconds CPU, 0.11 seconds wall
    INFO: batch.c(763): apple/file_2001: 0.01 xRT (CPU), 0.01 xRT (elapsed)
    PLEASE TAKE THE CHARACTERS ONE THREE EIGHT SEVEN TWO ONCE AGAIN ONE THREE EIGHT SEVEN TWO (apple/file_2001 -11322)
    apple/file_2001 done --------------------------------------
    

    Any magic here?

     
    • Nickolay V. Shmyrev

      Our software has several hidden tweaks to reduce captcha cracking capabilities.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.