Hello everyone, I am having problems to recognize digits in my pocketsphinx app. I am trying to recognize a speech that represents a telephone number, this number is spoken digit by digit until the user stops speaking. The lenght of the telephone number may vary. For instance the telephone number 1 212 664 444 is spoken in this way one two one two six six four four four four (all at once)
And i am using cmusphinx-en-us-ptm-5.2.tar.gz on linux Ubuntu 14.04.3 LTS
The problem is that the final recognition result is filled with numbers which never were spoken for example if i say the next telephone number 1 212 664 444 pocketsphinx recognizes this:
21212166144441 (first attempt)
9911212166441414 (second attempt)
191212166541414141 (third attempt)
Could you please help me find out what i am doing wrong ? Is there a special configuration i have to do for digits ? I am setting this value setKeywordThreshold(1e-45f), is this correct ??
Thanks a lot in advance for your help
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You need to give us means to reproduce your problem: provide the code you are using, provide the data you are using, describe what exactly are you running.
It looks like the sample rate of your data does not match required 16khz.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello Nickolay this is what I have done, I downloaded the Android Demo App and I tested the performance for digits recognition and it is far better than my current app. Because of this result I took all the acoustic model files from the Android Demo App and I place them for my current app and my results have improved around 60% for digits recognition. What I want to do now is to adapt the acoustic model because my app is going to work surrounded with highway noise. To do this adaptation I am using the acoustic model files from the Android Demo App (feat.params, mdef, means, noisedict, sendump, transition_matrices and variances) the problem is that I am realizing that the file mixture_weights doesnt come inside the Android Demo App and this file is required to do the adaptation, where can I get this file from ??? Thanks for the help.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am realizing that the file mixture_weights doesnt come inside the Android Demo App and this file is required to do the adaptation, where can I get this file from ???
Some models like en-us are distributed in compressed version. Extra files required for adaptation are excluded to save space. For en-us model from pocketsphinx you can download the full version suitable for adaptation from the downloads:
Hello everyone, I am having problems to recognize digits in my pocketsphinx app. I am trying to recognize a speech that represents a telephone number, this number is spoken digit by digit until the user stops speaking. The lenght of the telephone number may vary. For instance the telephone number 1 212 664 444 is spoken in this way one two one two six six four four four four (all at once)
My grammar is this:
JSGF V1.0;
grammar call_number_digits_numeric;
<call_number_digits_numeric> =
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9
;
public <r_call_number_digits_numeric> = <call_number_digits_numeric>+ ;</call_number_digits_numeric></r_call_number_digits_numeric></call_number_digits_numeric>
And i am using cmusphinx-en-us-ptm-5.2.tar.gz on linux Ubuntu 14.04.3 LTS
The problem is that the final recognition result is filled with numbers which never were spoken for example if i say the next telephone number 1 212 664 444 pocketsphinx recognizes this:
21212166144441 (first attempt)
9911212166441414 (second attempt)
191212166541414141 (third attempt)
Could you please help me find out what i am doing wrong ? Is there a special configuration i have to do for digits ? I am setting this value setKeywordThreshold(1e-45f), is this correct ??
Thanks a lot in advance for your help
You need to give us means to reproduce your problem: provide the code you are using, provide the data you are using, describe what exactly are you running.
It looks like the sample rate of your data does not match required 16khz.
Dear Nickolay thanks for the comments, I got some improvement but I still have some doubts, I ll post more info about my code, thanks
Hello Nickolay this is what I have done, I downloaded the Android Demo App and I tested the performance for digits recognition and it is far better than my current app. Because of this result I took all the acoustic model files from the Android Demo App and I place them for my current app and my results have improved around 60% for digits recognition. What I want to do now is to adapt the acoustic model because my app is going to work surrounded with highway noise. To do this adaptation I am using the acoustic model files from the Android Demo App (feat.params, mdef, means, noisedict, sendump, transition_matrices and variances) the problem is that I am realizing that the file mixture_weights doesnt come inside the Android Demo App and this file is required to do the adaptation, where can I get this file from ??? Thanks for the help.
From http://cmusphinx.sourceforge.net/wiki/tutorialadapt
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/cmusphinx-en-us-ptm-5.2.tar.gz/download
Thanks a lot Nickolay, best regards