CMU Sphinx / Forums / Help: Train and test data

Tania Mendonca - 2017-03-22

what is an efficient way of choosing the audios for training and testting?
for now I have 1267 audios in female voice as well as 1267 audios in male voice with same transcriptions. i have taken the same 1013(wich equals to 2026 total training audio's) audio's as train from female and male and the remaining as the test data. Please let me know if this is the right way?(80% train and 20% for test). The data is nearly around 5.8hrs.
initially my audio was
$:soxi text_1.wav

Input File : 'text_1.wav'
Channels : 1
Sample Rate : 48000
Precision : 16-bit
Duration : 00:00:12.39 = 594854 samples ~ 929.459 CDDA sectors
File Size : 1.19M
Bit Rate : 768k
Sample Encoding: 16-bit Signed Integer PCM

This is the command i used to downsample to 16kHz
sox text_1.wav -b 16 text_1-f.wav rate 16k

After resampling
Input File : 'text_1-f.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:12.39 = 198285 samples ~ 929.461 CDDA sectors
File Size : 397k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM

Though the training was successful for the word model(with some errors related to some audio files not reaching final state, where i removed such audios) I'm getting an word error rate of 93%.
I assume that as the data increases the WER should decrease? But I've seen an increase in the WER which leads to poor accuracy
is it because I have opted a wrong way of dividing train and test?(Should there be some sentences from train in the test set?)
or is it becoz i downsampled the audio to 16kHz that the recognition is so horrible?
or is it due to the feature extraction?

Last edit: Tania Mendonca 2017-03-22

new.lm

new_train.transcription

text_1-f.mfc

text_1-f.wav

text_1.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Arseniy Gorin - 2017-03-23
  
  Quite many questions in one place. Still:
  
  Please let me know if this is the right way?(80% train and 20% for test).
  
  This ratio seems OK. You can even do 10% for testing to have more training data.
  
  I'm getting an word error rate of 93%.
  
  This is too much. Something likely was wrong. You should probably analyse alignment files in the decoding directory. You can also share your working directory for further analysis
  
  (Should there be some sentences from train in the test set?)
  
  No. Moreover, it is not recommended to have the same speaker in train and test. But it is good to have train and test phonetically balanced.
  
  or is it becoz i downsampled the audio to 16kHz that the recognition is so horrible?
  
  unlikely
  
  or is it due to the feature extraction?
  
  could be. but it is difficult to say without having your working directory to replicate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Tania Mendonca - 2017-03-24
    
    When i checked the perplexity of my language model on the test set i got a perplexity of 5700 and the OOV rate is 53%
    Is it because of this my WER is 93%?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2017-03-25
      
      Yes
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Tania Mendonca - 2017-03-27
        
        Is there a way i can reduce the perplexity of the language model
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Nickolay V. Shmyrev - 2017-03-27
        
        Use more relevant data in language model training.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Train and test data

Speech Recognition Toolkit

Forums

Help

Train and test data document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Train and test data