Menu

Concept behind the acoustic and language model - Sphinx

Nayak BS
2013-12-13
2013-12-16
  • Nayak BS

    Nayak BS - 2013-12-13

    Hallo

    Thank you for developing CMU sphinx toolkit and this forum . It is very fascinating for me to see how it works for a new acoustic model which i am trying to implement now and which should be able to recognize 3 language with very limited words. But at the moment due to the fact i dont have indepth knowledge on HMM models hence i would like to know a few things from you--

    1)The outcomes of the Language model file should be the probabilities of occurance of the word/sequence in the entire transcription file using Ngram method

    so i should see one value with a word or sequence -- what are these two values in this example of lm file from sphinx4 then ??

    -4.6578 ABLE -1.136975
    -3.9874 ABNORMAL -0.53008

    2) In training a new model Do we need to give audio version of all sets of phenomes used in the dictionary also!

    3) If there is anything to be taken care of in the audio if we consider to develop for an completely new language ?

    4)The statistical representations of speech obtained after creating the acoustic model gives informations of the characteristics with respect to the words or phenomes ?

    5)in creating language model there is a step to creat idn gram for each word what does it exactly performs ?

    I thank you for your efforts to share your knowledge with me.

    Bestwishes

     
  • Nickolay V. Shmyrev

    to i should see one value with a word or sequence -- what are these two values in this example of lm file from sphinx4 then ??

    Beside the probability of sequences language model contains the smoothing factors which help to calculate the probability of unknown sequence by composing it from known sequences (backoff weight). The first number is probability in log scale, the second is word, the third is backoff weight. You can learn more about ARPA model format from documentation:

    http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html

    In training a new model Do we need to give audio version of all sets of phenomes used in the dictionary also!

    No

    If there is anything to be taken care of in the audio if we consider to develop for an completely new language ?

    Audio must closely represent the audio you are going to recognize, that's all.

    The statistical representations of speech obtained after creating the acoustic model gives informations of the characteristics with respect to the words or phenomes ?

    The representation contains information about context-dependent sounds (phonemes in certain context)

    in creating language model there is a step to creat idn gram for each word what does it exactly performs ?

    It counts words in compact form (idngrams) to train a language model later

     
  • Nayak BS

    Nayak BS - 2013-12-16

    Thank you very much Nickolay

     

Log in to post a comment.