Menu

Speech Recognition Overview

Vamsi
2015-07-03
2015-07-07
  • Vamsi

    Vamsi - 2015-07-03

    Hello,

    I am trying to assimilate the logical steps of how speech recognition is implemented from what I have gathered from the following sources; am hoping to get a validation if my understanding so far.
    Sources:
    1.http://mi.eng.cam.ac.uk/reports/svr-ftp/auto-pdf/young_tr38.pdf
    2.Fundamentals-Speech-Recognition-Lawrence-Rabiner
    3.Spoken Language Processing-Haung Xuedong

    Lets take the case of isolated word recognition

    1.Recognition starts creation of phoneme sequence for the words based on the phoneme representation in the dictionary.

    1. The phoneme representation is used to form a connected string of HMMs based on the triphone to HMM mapping in the AM

    3.Token passing algorithm is used to compute the alignment score of each observation frame to the states in HMM.

    4.HMM states correspond to senones. Each senone is represented by a GMM.

    5.The state transition probability from the HMMs and emission probability from the GMM information for each senone is used to compute the token score.

    6.The word with highest token score at the end of utterance is recognised as the spoken word.

    Can you please validate my understanding?

    Regards,
    Vamsi

     
    • Nickolay V. Shmyrev

      Your understanding is correct.

      I would reorder 4,5 and 3 in your list though. You first describe structure of the model, then token passing algorithm. The right sequence would be 1, 2, 4, 5, 3, 6.

       
  • Vamsi

    Vamsi - 2015-07-07

    Nickolay, Thanks for clarifying!

     

Log in to post a comment.