Menu

Training GMMs of each phoneme

2017-08-29
2017-08-30
  • JAMES ATTARD

    JAMES ATTARD - 2017-08-29

    Hi All,

    I am currently doing an MSc which involves designing an SR system using HMMs in order to get familiar with the process. Currently I have developed a system which uses an HMM model for each word. I am using Kmeans in order to get the observations from the MFCC features extracted from the audio signal.

    Now the next step I want to do is to use GMMs instead of Kmeans and have a model for each phoneme instead of a model for each word. I have used Sphinx toolkit before and when I train the system it automatically trains each phoneme GMM and HMM without having to provide any information on the timing of each phoneme. In some cases I have been amazed on how accurate this can be.

    I cannot find clear information on how this is done. I am currently using the Baum Welch training method for the HMM but don't know if there are some extra steps to train phoneme HMMs.

    How does Sphinx toolkit do this and is there any reading material you suggest in order to get more familiar with this process.

    Thanks.

     
  • JAMES ATTARD

    JAMES ATTARD - 2017-08-30

    Hi All,

    It seems that I might have found the information required to answer my own question. Looking at the HTK book I found that training of the subword GMM and HMM consists of the following:

    1. Making a list of all the phonemes in your recordings (includng silence, pause etc). Each phoneme will have a 3-5 state HMM model.
    2. Make phoneme transcriptions of all the words in the recordings.
    3. By using the transcription of the training corpus you have and the lists generated in files 1 and 2 you can then concatenate all models together in order to create one larger word model.
    4. Initialise all the parameters equally
    5. Then you can train and tune the parameters in the model by using the Baum Welch method.

    I have still to test it out myself but should mostly be it. Will see how it goes.

     
    • Nickolay V. Shmyrev

      HTK book is a good source of information, you can also read "Spoken language processing" textbook recommended in our tutorial.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.