Hello,
I just installed Sphinx from the module distribution tarball. I wonder what people are experiencing with regard to the speech recognition quality. I just want to make sure that I am not the only one whose "Hello" gets recognized as "Say Then."
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ha! Reminds me of a speech processing group I used to work with. TTS and SR bugs can be pretty humorous. I have always thought that TTS and SR should test each other. If you think about it, it's perfect, the TTS will always generate the same sounds, so the SR's improvement can be quantitatively measured, even learning, and the TTS can be tested for accuracy given feedback from the SR. They could both learn from each other! Anyway, for an easy test of Sphinx, how about recording the numbers zero through nine, and then testing the SR on those?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not a native English speaker but even then it recognizes most numbers and proper nouns correctly and that's very good.... It does however goof up 'Eighty' and 'Eighteen' etc... and puts 'go' as 'say'.
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I *am* a native speaker of American english, with a fairly normal accent, as compared to the rest of the students here. (We have a draw from all over the country, so that's probably a decent sample to compare to.) However, sphinx doesn't recognise much for me. The following output from sphinx2-demo corresponds to me saying "one" once per line:
[silence] [audio] LOST
[silence] [audio] COLOR METER
[silence] [audio] WANDER
[silence] [audio] FOUR
[silence] [audio] ONE
[silence] [audio] LOST
[silence] [audio] LOST
[silence] [audio] ONE THE
That's better than usual for me, in that it got the right number of syllables more often than the wrong number, and even got the word I said a couple times.
I'm nowhere near qualified to hack any of the recognition algorythms, but I can code a bit and I can deal with development software happily, so if there is anything I can do to help provide useful testing data I'm willing. dingman at cs dot earlham dot edu
-Andrew Dingman
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
After wracking my brain for months, I just found that I can only get 2.03 to work with things like the turtle model. I don't know if it is because I use 8khz telephone data or what. I **just** found out that the mfc files WILL work with 2.01, 2.01a and 2.02.
I have not yet been able to feed it real 8khz data and have it recognize it. Basically, a couple of things will be recognized, the rest is junk. Sounds like what others are seeing.
I wish I could say that I have a more recent version working but I don't. Now I guess I have to go in and find out why sphinx is not generating the proper mfc internally.... Maybe I need a gasmask? ;-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This is an old thread, but this example:
[silence] [audio] LOST
[silence] [audio] COLOR METER
[silence] [audio] WANDER
[silence] [audio] FOUR
[silence] [audio] ONE
[silence] [audio] LOST
[silence] [audio] LOST
[silence] [audio] ONE THE
Would seem to imply to me that the user is running against the included turtle grammar. Speech rec is only as good as the grammar, and the turtle grammar is, imho, too simple to provide a good testing environment. A lesson learned from my Nuance experience. When I test the app against a more defined grammar, the rec. quality improves radically.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2001-05-10
when I run the demo script it just says [silence][audio][silence][audio][silence][audio]
Anyone have any ideas?
Dan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've been encountering similar problems moving from 0.2 to 0.3 on Solaris. The recognition appears to be much poorer using identical language models and phonetic models. I'm not sure where to begin looking for the source of the problem.
It's made more difficult by the way all the directories have been reorganized and filenames have been changed unecessarilly.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I just installed Sphinx from the module distribution tarball. I wonder what people are experiencing with regard to the speech recognition quality. I just want to make sure that I am not the only one whose "Hello" gets recognized as "Say Then."
Thanks.
Ha! Reminds me of a speech processing group I used to work with. TTS and SR bugs can be pretty humorous. I have always thought that TTS and SR should test each other. If you think about it, it's perfect, the TTS will always generate the same sounds, so the SR's improvement can be quantitatively measured, even learning, and the TTS can be tested for accuracy given feedback from the SR. They could both learn from each other! Anyway, for an easy test of Sphinx, how about recording the numbers zero through nine, and then testing the SR on those?
I'm not a native English speaker but even then it recognizes most numbers and proper nouns correctly and that's very good.... It does however goof up 'Eighty' and 'Eighteen' etc... and puts 'go' as 'say'.
Thanks.
I *am* a native speaker of American english, with a fairly normal accent, as compared to the rest of the students here. (We have a draw from all over the country, so that's probably a decent sample to compare to.) However, sphinx doesn't recognise much for me. The following output from sphinx2-demo corresponds to me saying "one" once per line:
[silence] [audio] LOST
[silence] [audio] COLOR METER
[silence] [audio] WANDER
[silence] [audio] FOUR
[silence] [audio] ONE
[silence] [audio] LOST
[silence] [audio] LOST
[silence] [audio] ONE THE
That's better than usual for me, in that it got the right number of syllables more often than the wrong number, and even got the word I said a couple times.
I'm nowhere near qualified to hack any of the recognition algorythms, but I can code a bit and I can deal with development software happily, so if there is anything I can do to help provide useful testing data I'm willing. dingman at cs dot earlham dot edu
-Andrew Dingman
After wracking my brain for months, I just found that I can only get 2.03 to work with things like the turtle model. I don't know if it is because I use 8khz telephone data or what. I **just** found out that the mfc files WILL work with 2.01, 2.01a and 2.02.
I have not yet been able to feed it real 8khz data and have it recognize it. Basically, a couple of things will be recognized, the rest is junk. Sounds like what others are seeing.
I wish I could say that I have a more recent version working but I don't. Now I guess I have to go in and find out why sphinx is not generating the proper mfc internally.... Maybe I need a gasmask? ;-)
This is an old thread, but this example:
[silence] [audio] LOST
[silence] [audio] COLOR METER
[silence] [audio] WANDER
[silence] [audio] FOUR
[silence] [audio] ONE
[silence] [audio] LOST
[silence] [audio] LOST
[silence] [audio] ONE THE
Would seem to imply to me that the user is running against the included turtle grammar. Speech rec is only as good as the grammar, and the turtle grammar is, imho, too simple to provide a good testing environment. A lesson learned from my Nuance experience. When I test the app against a more defined grammar, the rec. quality improves radically.
when I run the demo script it just says [silence][audio][silence][audio][silence][audio]
Anyone have any ideas?
Dan
Non-feature in sphinx2-demo has been fixed. sphinx2-simple is more verbose.
kevin
I've been encountering similar problems moving from 0.2 to 0.3 on Solaris. The recognition appears to be much poorer using identical language models and phonetic models. I'm not sure where to begin looking for the source of the problem.
It's made more difficult by the way all the directories have been reorganized and filenames have been changed unecessarilly.