I'm starting small android app for which I would like to use speech recognition.
Use-case for pocketsphinx is to use it to detect several simple voice commands which would direct the flow of the app.
Which model of recognition do you suggest I use?
Is it possible to define more than one keyword?
I would like to find mode where pocketsphinx fires an event as soon as key word or phrase is detected. Is that possible?
I managed to get modified demo app running and it starts listening for words that I listed instead of digits in the file but only after triggering this kind of recognition after detecting the key phrase. Problem is that event is not fireing until timeout is reached but only after detecting silence.
Is it possible to continuously monitor for a list of words or phrases and fire an event as soon as this is recognized and not wait for the silence?
Thanks in advance for any pointers I get.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is it possible to define more than one keyword? I would like to find mode where pocketsphinx fires an event as soon as key word or phrase is detected. Is that possible?
So, I have tried to implement what was said in the stackoverflow thread and I managed to do it but application is triggering recognition of "up" word constantly.
This is what I did:
- shortened the dict file to contain only 4 words (for testing): up, down, left, right
- created a key-words.txt with this content: pastie_link
- changed the initialization method for the recognizer object: pastie_link
I tried playing with treshold values but I can't set it up so that all key words get recognized independently.
App is either always triggering "up" or it does not recognize "down".
Can you please give me some pointers on the correct values for those?
Thresholds for words are independent, so you can tune them independently too, "up" recognition does not affect "down" recognition. If you have too many false alarms, you can raise threshold even higher than 1.0.
Overall, your keyphrases are too short to be detected reliably. You can use a longer activation keyphrase to avoid false alarms.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'll try with longer phrases and if I understand correctly: the higher the number for each word implies more precise matching of "heard" sound with the model that describes that word(s). Right?
And if I would want to somehow train the recognition engine to the way I or whoever pronounces these phrases - is that possible?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'll try with longer phrases and if I understand correctly: the higher the number for each word implies more precise matching of "heard" sound with the model that describes that word(s). Right?
Yes
And if I would want to somehow train the recognition engine to the way I or whoever pronounces these phrases - is that possible?
No
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm starting small android app for which I would like to use speech recognition.
Use-case for pocketsphinx is to use it to detect several simple voice commands which would direct the flow of the app.
Which model of recognition do you suggest I use?
Is it possible to define more than one keyword?
I would like to find mode where pocketsphinx fires an event as soon as key word or phrase is detected. Is that possible?
I managed to get modified demo app running and it starts listening for words that I listed instead of digits in the file but only after triggering this kind of recognition after detecting the key phrase. Problem is that event is not fireing until timeout is reached but only after detecting silence.
Is it possible to continuously monitor for a list of words or phrases and fire an event as soon as this is recognized and not wait for the silence?
Thanks in advance for any pointers I get.
You can use default one
Yes, sure
http://stackoverflow.com/questions/25748113/recognizing-multiple-keywords-using-pocketsphinx
Hi,
So, I have tried to implement what was said in the stackoverflow thread and I managed to do it but application is triggering recognition of "up" word constantly.
This is what I did:
- shortened the dict file to contain only 4 words (for testing): up, down, left, right
- created a key-words.txt with this content: pastie_link
- changed the initialization method for the recognizer object: pastie_link
I tried playing with treshold values but I can't set it up so that all key words get recognized independently.
App is either always triggering "up" or it does not recognize "down".
Can you please give me some pointers on the correct values for those?
Last edit: mesh 2016-06-12
Thresholds for words are independent, so you can tune them independently too, "up" recognition does not affect "down" recognition. If you have too many false alarms, you can raise threshold even higher than 1.0.
Overall, your keyphrases are too short to be detected reliably. You can use a longer activation keyphrase to avoid false alarms.
OK, thanks Nickolay!
I'll try with longer phrases and if I understand correctly: the higher the number for each word implies more precise matching of "heard" sound with the model that describes that word(s). Right?
And if I would want to somehow train the recognition engine to the way I or whoever pronounces these phrases - is that possible?
Yes
No