Concept behind the acoustic and language model - Sphinx

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Concept behind the acoustic and language model - Sphinx

Forum: Speech Recognition Theory

Creator: Nayak BS

Created: 2013-12-13

Updated: 2013-12-16

Nayak BS - 2013-12-13

Hallo

Thank you for developing CMU sphinx toolkit and this forum . It is very fascinating for me to see how it works for a new acoustic model which i am trying to implement now and which should be able to recognize 3 language with very limited words. But at the moment due to the fact i dont have indepth knowledge on HMM models hence i would like to know a few things from you--

1)The outcomes of the Language model file should be the probabilities of occurance of the word/sequence in the entire transcription file using Ngram method

so i should see one value with a word or sequence -- what are these two values in this example of lm file from sphinx4 then ??

-4.6578 ABLE -1.136975
-3.9874 ABNORMAL -0.53008

2) In training a new model Do we need to give audio version of all sets of phenomes used in the dictionary also!

3) If there is anything to be taken care of in the audio if we consider to develop for an completely new language ?

4)The statistical representations of speech obtained after creating the acoustic model gives informations of the characteristics with respect to the words or phenomes ?

5)in creating language model there is a step to creat idn gram for each word what does it exactly performs ?

I thank you for your efforts to share your knowledge with me.

Bestwishes

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-12-13

to i should see one value with a word or sequence -- what are these two values in this example of lm file from sphinx4 then ??

Beside the probability of sequences language model contains the smoothing factors which help to calculate the probability of unknown sequence by composing it from known sequences (backoff weight). The first number is probability in log scale, the second is word, the third is backoff weight. You can learn more about ARPA model format from documentation:

http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html

In training a new model Do we need to give audio version of all sets of phenomes used in the dictionary also!

No

If there is anything to be taken care of in the audio if we consider to develop for an completely new language ?

Audio must closely represent the audio you are going to recognize, that's all.

The statistical representations of speech obtained after creating the acoustic model gives informations of the characteristics with respect to the words or phenomes ?

The representation contains information about context-dependent sounds (phonemes in certain context)

in creating language model there is a step to creat idn gram for each word what does it exactly performs ?

It counts words in compact form (idngrams) to train a language model later

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nayak BS - 2013-12-16

Thank you very much Nickolay

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.