Menu

TIMIT ASR HIGH WER

Help
2016-03-15
2016-03-29
  • Senjam Shantirani

    I trained the model using the Sphinx3 with TIMIT corpus aand tested as below:

    a) Testing with traing data

        Aligning results to find error rate
        SENTENCE ERROR: 28.3% (1308/4620)   WORD ERROR RATE: 4.8% (1895/39895)
    

    b) Testing with test data

        Aligning results to find error rate
        SENTENCE ERROR: 100.0% (10/10)   WORD ERROR RATE: 90.8% (78/87)
    

    Can you please suggest if my training is proper based on the result (a), I used lmtool for Language Model creation.

    Also I think sphinx_decode.cfg by default tests with the the model parameter "timit.cd_cont_1000_8"
    Please comment.

    Thanking you,
    Shanti

     
    • Nickolay V. Shmyrev

      Most likely you didn't create the language model properly. Since you don't provide the details it is hard to suggest you more precise information.

       
  • Senjam Shantirani

    1) Training ended like below:


    This step had 2 ERROR messages and 0 WARNING messages. Please check the log file for details.
    Normalization for iteration: 7
    Current Overall Likelihood Per Frame = 1.92801139707547
    Split Gaussians, increase by 0
    Training for 8 Gaussian(s) completed after 7 iterations
    MODULE: 90 deleted interpolation
    Skipped for continuous models
    MODULE: 99 Convert to Sphinx2 format models
    Can not create models used by Sphinx-II.
    If you intend to create models to use with Sphinx-II models, please rerun with:
    $ST::CFG_HMM_TYPE = '.semi.' or
    $ST::CFG_HMM_TYPE = '.cont' and $ST::CFG_FEATURE = '1s_12c_12d_3p_12dd' and $ST::CFG_STATESPERHMM = '5'
    root@shanti-Satellite-C650:/home/Phonemodel/workspace/hmm#


    I got the following files inside the model parameter folder:
    timit.cd_cont_1000
    timit.cd_cont_1000_1
    timit.cd_cont_1000_2
    timit.cd_cont_1000_4
    timit.cd_cont_1000_8
    timit.cd_cont_initial
    timit.cd_cont_untied
    timit.ci_cont
    timit.ci_cont_flatinitial

    2) Created the LM listing the sentences in a .txt file , with removed punctuations and uploaded in the LM tool creation site. To mention, I created new LM model for the testing files again and tested with that. The generated LM is at the link

    http://www.speech.cs.cmu.edu/tools/product/1458018745_29981/

    3) The feature extraction is done using Sphinx_fe and wav files were 16KHz and mono channels.

    4) The sphinx_decode.cfg and sphinx_train.cfg are attached for reference.

    Thanking you for all the assistance given so far.

    Regards,
    Shanti

     

    Last edit: Senjam Shantirani 2016-03-15
    • Nickolay V. Shmyrev

      You need to provide the whole model training folder

       
  • Senjam Shantirani

    Hi Nickolay,

    I have uploaded the working folder in my GDrive. I tried with Github but taking too long for me.

    https://drive.google.com/file/d/0B5R7ajmu4w5iRHpPT2RvR0Q1TGc/view?usp=sharing

    Please comment on my error, why the testing with training files is good and testing with testing files is bad.

    Regards,
    Shanti

     
    • Nickolay V. Shmyrev

      In your working folder I do not see a bad error rate of 90%, I only see a good decoding result with 5% word error rate. I do not see a test set you are talking about.

      Also, you are using old sphinxtrain. You need to use latest sphinxtrain, sphinxbase and pocketsphinx.

       
  • Senjam Shantirani

    Sorry, the timit_test.fileids and the corresponding transcription there have the train wav files and their corresponding transcriptions listed in them... If I ran with the testing data attached here, it shows the following

    root@shanti-Satellite-C650:/home/PhoneModel/workspace/hmm# perl scripts_pl/decode/slave.pl
    MODULE: DECODE Decoding using models previously trained
    Decoding 20 segments starting at 0 (part 1 of 1)
    0%
    This step had 45 ERROR messages and 3 WARNING messages. Please check the log file for details.
    Aligning results to find error rate
    SENTENCE ERROR: 100.0% (20/20) WORD ERROR RATE: 111.1% (189/171)
    root@shanti-Satellite-C650:/home/PhoneModel/workspace/hmm#

    Do you think, the training has something wrong, or shoud I increase the number of Guassians and iterations, and test again?

    The new working folder is at the link:

    https://drive.google.com/file/d/0B5R7ajmu4w5iT29uX2JPRVlwdXM/view?usp=sharing

    Regards,
    Senjam Shantirani

     

    Last edit: Senjam Shantirani 2016-03-24
    • Nickolay V. Shmyrev

      I do not think any critical issue with your training, you just do not have sufficient data to train an acoustic and language models. For acoustic model you need at least 50 hours of data, for langauge model you need at least 1gb of texts. You can try with tedlium corpus instead of timit, it is much more reasonable database to try for LVCSR.

      I also see you are using old sphinxtrain, I recommend you to update to latest version.

       
  • Senjam Shantirani

    Thank you Nickolay for your kind advice.
    I will continue with it.

     

Log in to post a comment.