Training GMMs of each phoneme

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Training GMMs of each phoneme

Forum: Speech Recognition Theory

Creator: JAMES ATTARD

Created: 2017-08-29

Updated: 2017-08-30

JAMES ATTARD - 2017-08-29

Hi All,

I am currently doing an MSc which involves designing an SR system using HMMs in order to get familiar with the process. Currently I have developed a system which uses an HMM model for each word. I am using Kmeans in order to get the observations from the MFCC features extracted from the audio signal.

Now the next step I want to do is to use GMMs instead of Kmeans and have a model for each phoneme instead of a model for each word. I have used Sphinx toolkit before and when I train the system it automatically trains each phoneme GMM and HMM without having to provide any information on the timing of each phoneme. In some cases I have been amazed on how accurate this can be.

I cannot find clear information on how this is done. I am currently using the Baum Welch training method for the HMM but don't know if there are some extra steps to train phoneme HMMs.

How does Sphinx toolkit do this and is there any reading material you suggest in order to get more familiar with this process.

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

JAMES ATTARD - 2017-08-30

Hi All,

It seems that I might have found the information required to answer my own question. Looking at the HTK book I found that training of the subword GMM and HMM consists of the following:

Making a list of all the phonemes in your recordings (includng silence, pause etc). Each phoneme will have a 3-5 state HMM model.

Make phoneme transcriptions of all the words in the recordings.

By using the transcription of the training corpus you have and the lists generated in files 1 and 2 you can then concatenate all models together in order to create one larger word model.

Initialise all the parameters equally

Then you can train and tune the parameters in the model by using the Baum Welch method.

I have still to test it out myself but should mostly be it. Will see how it goes.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-08-30
  
  HTK book is a good source of information, you can also read "Spoken language processing" textbook recommended in our tutorial.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.