Menu

HMM - forward procedure problem

Anonymous
2010-12-07
2012-09-22
  • Anonymous

    Anonymous - 2010-12-07

    Hi,

    I am working on basic speech recognition system using discrete Hidden Markov
    models to recognize isolated words. I am using MFCC coefficients as features
    of sound sample. I am also using 50 cetroides to which I assign these MFCC
    coefficient vecotors.

    I have written the Baum Welch algorithm to train the model. I have tested it
    on some basic HMModel to check its funtionality.

    I have this problem: In some cases, when I am training new model using 10
    sound samples (10 times the same word is spoken) the emission probability of
    some symbol is in every state ZERO (that symbol wasn't emitted during
    training). That wouldn't be any problem i think. But afterwards when I am
    trying to recognize this word one of these symbols is generated. That means
    something that hadn't been generated during training process is then generated
    during recognition (the word is somehow spoken another way.) Well, the speaker
    is not a machine, so i think that this can happen. But when this happens, the
    total probability of observing this sequency of symbols on this model is equal
    to zero (computed with forward procedure).

    My question is - How can I handle this situation? I haven't found any
    discussion about this topic. Or do I have to correct the parameters of my
    model somehow? (now I am using 24 states for every words, 50 centroides).

    I hope, I have explained my problem clearly.

    Thanks in advance,
    Peter

     
  • Nickolay V. Shmyrev

    Hello Peter

    That means something that hadn't been generated during training process is
    then generated during recognition

    Not sure how your "generated" terminology applies here. Did you have a chance
    to read Rabiner's tutorial?

    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.131.2084

    Probably you'll find it useful

    That means something that hadn't been generated during training process is
    then generated during recognition (the word is somehow spoken another way.)
    Well, the speaker is not a machine, so i think that this can happen. But when
    this happens, the total probability of observing this sequency of symbols on
    this model is equal to zero (computed with forward procedure).

    Not sure why do you see a problem here. If your training set is not
    sufficient, there is no wonder it doesn't recognize unseen states. You need
    more training data probably.

     
  • Anonymous

    Anonymous - 2010-12-07

    Hi , and thanks for reply

    Not sure how your "generated" terminology applies here

    Maybe I didn't write it right. I mean: the symbol, which has NOT been recorded
    during training is then recorded during recognition. That means that for
    example during the training process I record these symbol sequences (after
    vector quantization of course):

    sequence 1: 0 1 2 2 4 4 6 7 7 7
    sequence 2: 0 2 2 1 4 4 6 4 4 4 
    sequence 3: 2 2 4 4 4 6 0 0 1 1
    

    So after training I have some emission probabilities of symbols 0, 1, 2, 4, 6,
    7 that are greater then 0. But symbols 3, 5, 8, 9 have emission probability in
    all states equal 0 in this case.

    Than during recognition I record this sequence of symbols:

    0 1 2 2 3 4 4 7 0
    

    And because of symbol 3 in these sequence the total probability of this
    sequence on this model is equal zero
    . Even when this symbol was recorded only
    one time(can be because of some noise during recording). So I am wondering if
    there is some way to get non zero total probability even when such symbol is
    in the sequence? Or is it only a matter of bigger training set (so it can
    cover all the posibilities (or most of the posibilities) how the word can be
    spoken)?

    I have read the Rabiner tutorial, not once :) but haven't fount any note about
    this problem.

    Peter

     
  • Nickolay V. Shmyrev

    And because of symbol 3 in these sequence the total probability of this
    sequence on this model is equal zero. Even when this symbol was recorded only
    one time(can be because of some noise during recording). So I am wondering if
    there is some way to get non zero total probability even when such symbol is
    in the sequence? Or is it only a matter of bigger training set (so it can
    cover all the posibilities (or most of the posibilities) how the word can be
    spoken)?

    This issue covered in Rabiner, page 274, section D. Effects of Insufficient
    Training Data. Solutions: add more data, use deleted interpolation.

     
  • Anonymous

    Anonymous - 2010-12-08

    Well, that could be very useful for me. Thank you very much. I must have had
    some shorter version of this HMM tutorial without this section.

    Especially the fourth solution for insufficient training data seems to be very
    interresting. I will try to implement this contraints solution or deleted
    interpolation technique.

    **Thank you again for help. **
    Have a nice day,
    Peter

     

Log in to post a comment.