CMU Sphinx / Forums / Speech Recognition Theory: phonem recoginition

t0gg1e4u - 2009-08-09

Hi

I am new to the speech recognition problem and I am actually looking for something less complicated (so I assume): There is the need to recognize phonems (http://en.wikipedia.org/wiki/Phonem) in one of my projects, and I wonder if sphinx as a framwork would help me getting this information. I am not interested in recognizing any meaning in the utterances.

any hints are apreciated

cheers

martin

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2009-08-09
  
  > I am new to the speech recognition problem and I am actually looking for something less complicated (so I assume)
  
  Your assumptions aren't correct. The easiest thing is to recognize the grammar-based audio. The next thing is medium vocabulary text. The phone recognition is not the easy thing.
  
  > I wonder if sphinx as a framwork would help me getting this information.
  
  sphinx3_decode -mode allphone can return the list of phones for example. Other decoders can do this as well.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - t0gg1e4u - 2009-08-09
    
    thank you for your speedy reply. i tried to install sphinx3_decode on my os x machine, but I was unsuccessfull so far. however, I found out that sphinx4 is on java based, and thats a language I understand. is there a similar function implemented in sphinx4 as well?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - t0gg1e4u - 2009-08-09
      
      After successfull installation of sphinx4, playing all the demo apps and browsing the wiki, I wonder now:
      I found those dictionary files in which the words are mapped to the phenomes, and I assume now, that the decoder first tries to map the sound to the phenomes and then makes an educated guess which combination of words could match the best. If my assumption is right, is there a possbility to get this stream of phenoms out of the recognizer/decoder/linguist?
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Nickolay V. Shmyrev - 2009-08-11
        
        > that the decoder first tries to map the sound to the phenomes and then makes an educated guess which combination of words could match the best.
        
        This assumption is wrong. Moreover, phoneme is a completely different entity that is not related to speech sound at all. Decoder works with phones.
        
        > is there a possbility to get this stream of phenoms out of the recognizer/decoder/linguist?
        
        One words are recognized, you can get the phone sequence with Result.getBestPronunciationResult
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

phonem recoginition

Speech Recognition Toolkit

Forums

Help

phonem recoginition

phonem recoginition

Speech Recognition Toolkit

Forums

Help

phonem recoginition document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

phonem recoginition