Menu

Self-trained model runs good in sphinxtrain-test but bad with Pockstsphinx and Sphinx4

Help
kyon
2017-11-03
2017-11-08
  • kyon

    kyon - 2017-11-03

    Hi, I am newbee to cmu-sphinx.

    I created the language model using: http://www.speech.cs.cmu.edu/tools/lmtool-new.html
    And trained the acoustic mode followed: https://cmusphinx.github.io/wiki/tutorialam/
    Using latest release from https://cmusphinx.github.io/wiki/download/

    The training data is very simple. They are all in the format as "Please take the characters <digits> once again <digits>".
    There are 2,000 wav files for training(5 hours in total) and 500 files for test, which cost me 2days to prepare.</digits></digits>

    The training result is very good according to align file:

    TOTAL Words: 8256 Correct: 7278 Errors: 1105
    TOTAL Percent correct = 88.15% Error = 13.38% Accuracy = 86.62%
    TOTAL Insertions: 127 Deletions: 281 Substitutions: 697
    

    However, I got a very bad accuracy when running Pocketsphinx/Sphinx4 with my model.
    For example, file_2001.wav and file_2002.wav can be recognized in db.align file as:

    1 please take the characters one three eight seven two once again one three eight seven two  (apple-FILE_2001)
    2 please take the characters one three eight seven two once again one three eight seven two  (apple-FILE_2001)
    3 Words: 16 Correct: 16 Errors: 0 Percent correct = 100.00% Error = 0.00% Accuracy = 100.00%
    4 Insertions: 0 Deletions: 0 Substitutions: 0
    5 please take the characters one seven THREE nine one ONCE AGAIN one seven three nine ONE  (apple-FILE_2002)
    6 please take the characters one seven NINE  nine one ***  ***   one seven three nine ***  (apple-FILE_2002)
    7 Words: 16 Correct: 12 Errors: 4 Percent correct = 75.00% Error = 25.00% Accuracy = 75.00%
    

    Rather than pocketsphinx:

    file_2001.wav : NINE ZERO EIGHT NINE TWO NINE ONE
    file_2002.wav : NINE
    

    pocketsphinx pocommand is

    pocketsphinx_continuous -infile wav/file_2002.wav -hmm model_parameters/kyon2_db.cd_cont_32 -dict etc/kyon2_db.dic -lm etc/kyon2_db.lm.DMP
    

    Training db, including etc/model_parameters/result can be found : https://drive.google.com/open?id=0ByvxjxiH6xq3SEotczdNNVhacUU
    And I've uploaded 10 wav file for test.

    Do I miss something in training or using the model?

     

    Last edit: kyon 2017-11-03
  • kyon

    kyon - 2017-11-03

    Running with pocketsphinx_batch can get a very good result

    Command is :

    pocketsphinx_batch  -adcin yes  -cepdir wav  -cepext .wav  -ctl test.fileids  -lm etc/kyon2_db.lm.DMP -dict etc/kyon2_db.dic -hmm model_parameters/kyon2_db.cd_cont_32 -hyp test
    

    Result:

    INFO: batch.c(761): apple/file_2001: 8.23 seconds speech, 0.11 seconds CPU, 0.11 seconds wall
    INFO: batch.c(763): apple/file_2001: 0.01 xRT (CPU), 0.01 xRT (elapsed)
    PLEASE TAKE THE CHARACTERS ONE THREE EIGHT SEVEN TWO ONCE AGAIN ONE THREE EIGHT SEVEN TWO (apple/file_2001 -11322)
    apple/file_2001 done --------------------------------------
    

    Any magic here?

     
    • Nickolay V. Shmyrev

      Our software has several hidden tweaks to reduce captcha cracking capabilities.

       

Log in to post a comment.