we use sphinx on a university research project to control video playback/capture. We got it up and running but it seems we have a few prolems with the configuration. We want sphinx to just recognize 8-10 words (like stop, play, record, skip, forward, switch,...), and everything which is not in the grammar file (jsgf) should be "out-of-grammar utterances" (as in the faq).
2 Problems emerge here. First it always takes time till the recognizer has a hit, and second we have a poor recognition rate. Our configuration is based on the "Hello World" demo app extended by the "out of grammar utterances".
We tried a lot but didn't find the sweet spot yet.
Can somebody offer any suggestions on this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Armin, I have the exact same problem as you when I run the demo's like HelloWorld and HelloDigits. At first when I talk it seems like it hasn't woken up yet. After a while, it seems to wake up and starts at least guessing at words. But even then it seems to have a pretty poor recognition rate - perhaps 40% or so. Some digits like six, two, oh it struggles with, while four, five, seven seem fine.
As a test of my Motorola Bluetooth headset, I tried the Microsoft Office speech recognition and it worked fine. I also tried the 1-877-268-7526 test of Sphynx (book an office) THROUGH Skype with my headset, and it also worked fine - despite going through the Internet. So I concluded my microphone setup is fine.
I'm not sure where to go from here to improve things. Is Sphinx4 perhaps not really ready yet?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
we use sphinx on a university research project to control video playback/capture. We got it up and running but it seems we have a few prolems with the configuration. We want sphinx to just recognize 8-10 words (like stop, play, record, skip, forward, switch,...), and everything which is not in the grammar file (jsgf) should be "out-of-grammar utterances" (as in the faq).
2 Problems emerge here. First it always takes time till the recognizer has a hit, and second we have a poor recognition rate. Our configuration is based on the "Hello World" demo app extended by the "out of grammar utterances".
We tried a lot but didn't find the sweet spot yet.
Can somebody offer any suggestions on this?
Armin, I have the exact same problem as you when I run the demo's like HelloWorld and HelloDigits. At first when I talk it seems like it hasn't woken up yet. After a while, it seems to wake up and starts at least guessing at words. But even then it seems to have a pretty poor recognition rate - perhaps 40% or so. Some digits like six, two, oh it struggles with, while four, five, seven seem fine.
As a test of my Motorola Bluetooth headset, I tried the Microsoft Office speech recognition and it worked fine. I also tried the 1-877-268-7526 test of Sphynx (book an office) THROUGH Skype with my headset, and it also worked fine - despite going through the Internet. So I concluded my microphone setup is fine.
I'm not sure where to go from here to improve things. Is Sphinx4 perhaps not really ready yet?
-the command list is too short. This is usually very confusable in speech recognition. Try to insert multiple word commands instead.
-try to create a garbage grammar node and absorbe everything else other than the command. This could be a phone loop.
-tgj