100% error on training

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

100% error on training

Forum: Help

Creator: Omer Zaman

Created: 2017-06-03

Updated: 2017-06-07

Omer Zaman - 2017-06-03

Hello,
This is my first time using sphinx. I am using pocketsphinx for ubuntu. I have this assignment in which I have to recognise digits using the tidigits demo but for another language, Urdu. The first thing that I did was to run the demo, I trained it using 132 audio file utterances and 44 test data utterances. My training data was below the threshold so I set the CFG Train command to no. While the code executes now, I get this weird error that I am unable to figure out. Even though there is this one error during training, I get promising results. It shows that the model was 84% accurate on the test data regardless of the error. Now, I tried it for the Urdu language. I created a new dictionary, phone, test and train transcription files. I am using the same lm file from the tidigits template. When I execute the train command, I get the same error but this time the error rate is 100%. I am not sure why is this happening. I am totally new to this, I could really use some help. I am attaching the model for urdu digits. The one error that I encountered :
"ngram_model_trie.c", line 323: Error reading word strings (1140850634 doesn't match n_unigrams 14)

Thank you!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Omer Zaman - 2017-06-03

Last edit: Omer Zaman 2017-06-05

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-06-03
  
  Using English lm for Urdu is not a good idea.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Omer Zaman - 2017-06-03
    
    I was told by the instructor that for single digits the lm should suffice. Is there is a way around it? Can I maybe use a grammar?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2017-06-07
      
      Your language model is built from English words, you need to build it from Urdu words. It has nothing to do about grammars.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.