Hello,
This is my first time using sphinx. I am using pocketsphinx for ubuntu. I have this assignment in which I have to recognise digits using the tidigits demo but for another language, Urdu. The first thing that I did was to run the demo, I trained it using 132 audio file utterances and 44 test data utterances. My training data was below the threshold so I set the CFG Train command to no. While the code executes now, I get this weird error that I am unable to figure out. Even though there is this one error during training, I get promising results. It shows that the model was 84% accurate on the test data regardless of the error. Now, I tried it for the Urdu language. I created a new dictionary, phone, test and train transcription files. I am using the same lm file from the tidigits template. When I execute the train command, I get the same error but this time the error rate is 100%. I am not sure why is this happening. I am totally new to this, I could really use some help. I am attaching the model for urdu digits. The one error that I encountered :
"ngram_model_trie.c", line 323: Error reading word strings (1140850634 doesn't match n_unigrams 14)
Thank you!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
This is my first time using sphinx. I am using pocketsphinx for ubuntu. I have this assignment in which I have to recognise digits using the tidigits demo but for another language, Urdu. The first thing that I did was to run the demo, I trained it using 132 audio file utterances and 44 test data utterances. My training data was below the threshold so I set the CFG Train command to no. While the code executes now, I get this weird error that I am unable to figure out. Even though there is this one error during training, I get promising results. It shows that the model was 84% accurate on the test data regardless of the error. Now, I tried it for the Urdu language. I created a new dictionary, phone, test and train transcription files. I am using the same lm file from the tidigits template. When I execute the train command, I get the same error but this time the error rate is 100%. I am not sure why is this happening. I am totally new to this, I could really use some help. I am attaching the model for urdu digits. The one error that I encountered :
"ngram_model_trie.c", line 323: Error reading word strings (1140850634 doesn't match n_unigrams 14)
Thank you!
Last edit: Omer Zaman 2017-06-05
Using English lm for Urdu is not a good idea.
I was told by the instructor that for single digits the lm should suffice. Is there is a way around it? Can I maybe use a grammar?
Your language model is built from English words, you need to build it from Urdu words. It has nothing to do about grammars.