Menu

phonem recoginition

t0gg1e4u
2009-08-09
2012-09-22
  • t0gg1e4u

    t0gg1e4u - 2009-08-09

    Hi

    I am new to the speech recognition problem and I am actually looking for something less complicated (so I assume): There is the need to recognize phonems (http://en.wikipedia.org/wiki/Phonem) in one of my projects, and I wonder if sphinx as a framwork would help me getting this information. I am not interested in recognizing any meaning in the utterances.

    any hints are apreciated

    cheers

    martin

     
    • Nickolay V. Shmyrev

      > I am new to the speech recognition problem and I am actually looking for something less complicated (so I assume)

      Your assumptions aren't correct. The easiest thing is to recognize the grammar-based audio. The next thing is medium vocabulary text. The phone recognition is not the easy thing.

      > I wonder if sphinx as a framwork would help me getting this information.

      sphinx3_decode -mode allphone can return the list of phones for example. Other decoders can do this as well.

       
      • t0gg1e4u

        t0gg1e4u - 2009-08-09

        thank you for your speedy reply. i tried to install sphinx3_decode on my os x machine, but I was unsuccessfull so far. however, I found out that sphinx4 is on java based, and thats a language I understand. is there a similar function implemented in sphinx4 as well?

         
        • t0gg1e4u

          t0gg1e4u - 2009-08-09

          After successfull installation of sphinx4, playing all the demo apps and browsing the wiki, I wonder now:
          I found those dictionary files in which the words are mapped to the phenomes, and I assume now, that the decoder first tries to map the sound to the phenomes and then makes an educated guess which combination of words could match the best. If my assumption is right, is there a possbility to get this stream of phenoms out of the recognizer/decoder/linguist?

           
          • Nickolay V. Shmyrev

            > that the decoder first tries to map the sound to the phenomes and then makes an educated guess which combination of words could match the best.

            This assumption is wrong. Moreover, phoneme is a completely different entity that is not related to speech sound at all. Decoder works with phones.

            > is there a possbility to get this stream of phenoms out of the recognizer/decoder/linguist?

            One words are recognized, you can get the phone sequence with Result.getBestPronunciationResult

             

Log in to post a comment.