CMU Sphinx / Forums / Help: TIMIT ASR HIGH WER

Senjam Shantirani - 2016-03-15

I trained the model using the Sphinx3 with TIMIT corpus aand tested as below:

a) Testing with traing data

Aligning results to find error rate SENTENCE ERROR: 28.3% (1308/4620) WORD ERROR RATE: 4.8% (1895/39895)

b) Testing with test data

Aligning results to find error rate SENTENCE ERROR: 100.0% (10/10) WORD ERROR RATE: 90.8% (78/87)

Can you please suggest if my training is proper based on the result (a), I used lmtool for Language Model creation.

Also I think sphinx_decode.cfg by default tests with the the model parameter "timit.cd_cont_1000_8"
Please comment.

Thanking you,
Shanti
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-03-15
  
  Most likely you didn't create the language model properly. Since you don't provide the details it is hard to suggest you more precise information.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Senjam Shantirani - 2016-03-15

1) Training ended like below:

This step had 2 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 7
Current Overall Likelihood Per Frame = 1.92801139707547
Split Gaussians, increase by 0
Training for 8 Gaussian(s) completed after 7 iterations
MODULE: 90 deleted interpolation
Skipped for continuous models
MODULE: 99 Convert to Sphinx2 format models
Can not create models used by Sphinx-II.
If you intend to create models to use with Sphinx-II models, please rerun with:
$ST::CFG_HMM_TYPE = '.semi.' or
$ST::CFG_HMM_TYPE = '.cont' and $ST::CFG_FEATURE = '1s_12c_12d_3p_12dd' and $ST::CFG_STATESPERHMM = '5'
root@shanti-Satellite-C650:/home/Phonemodel/workspace/hmm#

I got the following files inside the model parameter folder:
timit.cd_cont_1000
timit.cd_cont_1000_1
timit.cd_cont_1000_2
timit.cd_cont_1000_4
timit.cd_cont_1000_8
timit.cd_cont_initial
timit.cd_cont_untied
timit.ci_cont
timit.ci_cont_flatinitial

2) Created the LM listing the sentences in a .txt file , with removed punctuations and uploaded in the LM tool creation site. To mention, I created new LM model for the testing files again and tested with that. The generated LM is at the link

http://www.speech.cs.cmu.edu/tools/product/1458018745_29981/

3) The feature extraction is done using Sphinx_fe and wav files were 16KHz and mono channels.

4) The sphinx_decode.cfg and sphinx_train.cfg are attached for reference.

Thanking you for all the assistance given so far.

Regards,
Shanti

Last edit: Senjam Shantirani 2016-03-15

sphinx_decode.cfg

sphinx_train.cfg

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-03-16
  
  You need to provide the whole model training folder
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Senjam Shantirani - 2016-03-24

Hi Nickolay,

I have uploaded the working folder in my GDrive. I tried with Github but taking too long for me.

https://drive.google.com/file/d/0B5R7ajmu4w5iRHpPT2RvR0Q1TGc/view?usp=sharing

Please comment on my error, why the testing with training files is good and testing with testing files is bad.

Regards,
Shanti

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-03-24
  
  In your working folder I do not see a bad error rate of 90%, I only see a good decoding result with 5% word error rate. I do not see a test set you are talking about.
  
  Also, you are using old sphinxtrain. You need to use latest sphinxtrain, sphinxbase and pocketsphinx.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Senjam Shantirani - 2016-03-24

Sorry, the timit_test.fileids and the corresponding transcription there have the train wav files and their corresponding transcriptions listed in them... If I ran with the testing data attached here, it shows the following

root@shanti-Satellite-C650:/home/PhoneModel/workspace/hmm# perl scripts_pl/decode/slave.pl
MODULE: DECODE Decoding using models previously trained
Decoding 20 segments starting at 0 (part 1 of 1)
0%
This step had 45 ERROR messages and 3 WARNING messages. Please check the log file for details.
Aligning results to find error rate
SENTENCE ERROR: 100.0% (20/20) WORD ERROR RATE: 111.1% (189/171)
root@shanti-Satellite-C650:/home/PhoneModel/workspace/hmm#

Do you think, the training has something wrong, or shoud I increase the number of Guassians and iterations, and test again?

The new working folder is at the link:

https://drive.google.com/file/d/0B5R7ajmu4w5iT29uX2JPRVlwdXM/view?usp=sharing

Regards,
Senjam Shantirani

Last edit: Senjam Shantirani 2016-03-24

timit_test.fileids

timit_test.transcription

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-03-27
  
  I do not think any critical issue with your training, you just do not have sufficient data to train an acoustic and language models. For acoustic model you need at least 50 hours of data, for langauge model you need at least 1gb of texts. You can try with tedlium corpus instead of timit, it is much more reasonable database to try for LVCSR.
  
  I also see you are using old sphinxtrain, I recommend you to update to latest version.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Senjam Shantirani - 2016-03-29

Thank you Nickolay for your kind advice.
I will continue with it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

TIMIT ASR HIGH WER

Speech Recognition Toolkit

Forums

Help

TIMIT ASR HIGH WER document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

TIMIT ASR HIGH WER