CMU Sphinx / Forums / Speech Recognition Theory: Quality of Recognition?

Anonymous - 2000-07-21

Hello,
I just installed Sphinx from the module distribution tarball. I wonder what people are experiencing with regard to the speech recognition quality. I just want to make sure that I am not the only one whose "Hello" gets recognized as "Say Then."

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Richard Bodo - 2000-07-24
  
  Ha! Reminds me of a speech processing group I used to work with. TTS and SR bugs can be pretty humorous. I have always thought that TTS and SR should test each other. If you think about it, it's perfect, the TTS will always generate the same sounds, so the SR's improvement can be quantitatively measured, even learning, and the TTS can be tested for accuracy given feedback from the SR. They could both learn from each other! Anyway, for an easy test of Sphinx, how about recording the numbers zero through nine, and then testing the SR on those?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Sankalp Upadhyay - 2000-08-25
  
  I'm not a native English speaker but even then it recognizes most numbers and proper nouns correctly and that's very good.... It does however goof up 'Eighty' and 'Eighteen' etc... and puts 'go' as 'say'.
  
  Thanks.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Andrew C. Dingman - 2000-09-07
    
    I *am* a native speaker of American english, with a fairly normal accent, as compared to the rest of the students here. (We have a draw from all over the country, so that's probably a decent sample to compare to.) However, sphinx doesn't recognise much for me. The following output from sphinx2-demo corresponds to me saying "one" once per line:
    [silence] [audio] LOST
    [silence] [audio] COLOR METER
    [silence] [audio] WANDER
    [silence] [audio] FOUR
    [silence] [audio] ONE
    [silence] [audio] LOST
    [silence] [audio] LOST
    [silence] [audio] ONE THE
    
    That's better than usual for me, in that it got the right number of syllables more often than the wrong number, and even got the word I said a couple times.
    
    I'm nowhere near qualified to hack any of the recognition algorythms, but I can code a bit and I can deal with development software happily, so if there is anything I can do to help provide useful testing data I'm willing. dingman at cs dot earlham dot edu
    
    -Andrew Dingman
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nancy L. Spear - 2001-09-14
      
      After wracking my brain for months, I just found that I can only get 2.03 to work with things like the turtle model. I don't know if it is because I use 8khz telephone data or what. I **just** found out that the mfc files WILL work with 2.01, 2.01a and 2.02.
      
      I have not yet been able to feed it real 8khz data and have it recognize it. Basically, a couple of things will be recognized, the rest is junk. Sounds like what others are seeing.
      
      I wish I could say that I have a more recent version working but I don't. Now I guess I have to go in and find out why sphinx is not generating the proper mfc internally.... Maybe I need a gasmask? ;-)
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Marc Seldin - 2001-04-02
  
  This is an old thread, but this example:
  [silence] [audio] LOST
                                   [silence] [audio] COLOR METER
                                   [silence] [audio] WANDER
                                   [silence] [audio] FOUR
                                   [silence] [audio] ONE
                                   [silence] [audio] LOST
                                   [silence] [audio] LOST
                                   [silence] [audio] ONE THE
  Would seem to imply to me that the user is running against the included turtle grammar. Speech rec is only as good as the grammar, and the turtle grammar is, imho, too simple to provide a good testing environment. A lesson learned from my Nuance experience. When I test the app against a more defined grammar, the rec. quality improves radically.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Anonymous - 2001-05-10
    
    when I run the demo script it just says [silence][audio][silence][audio][silence][audio]
    Anyone have any ideas?
    Dan
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Kevin A. Lenzo - 2001-07-10
      
      Non-feature in sphinx2-demo has been fixed. sphinx2-simple is more verbose.
      
      kevin
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- R. Paul McCarty - 2001-05-18
  
  I've been encountering similar problems moving from 0.2 to 0.3 on Solaris. The recognition appears to be much poorer using identical language models and phonetic models. I'm not sure where to begin looking for the source of the problem.
  
  It's made more difficult by the way all the directories have been reorganized and filenames have been changed unecessarilly.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Quality of Recognition?

Speech Recognition Toolkit

Forums

Help

Quality of Recognition?

Quality of Recognition?

Speech Recognition Toolkit

Forums

Help

Quality of Recognition? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Quality of Recognition?