CMU Sphinx / Forums / Speech Recognition Theory: HMM

Anonymous - 2010-12-07

Hi,

I am working on basic speech recognition system using discrete Hidden Markov
models to recognize isolated words. I am using MFCC coefficients as features
of sound sample. I am also using 50 cetroides to which I assign these MFCC
coefficient vecotors.

I have written the Baum Welch algorithm to train the model. I have tested it
on some basic HMModel to check its funtionality.

I have this problem: In some cases, when I am training new model using 10
sound samples (10 times the same word is spoken) the emission probability of
some symbol is in every state ZERO (that symbol wasn't emitted during
training). That wouldn't be any problem i think. But afterwards when I am
trying to recognize this word one of these symbols is generated. That means
something that hadn't been generated during training process is then generated
during recognition (the word is somehow spoken another way.) Well, the speaker
is not a machine, so i think that this can happen. But when this happens, the
total probability of observing this sequency of symbols on this model is equal
to zero (computed with forward procedure).

My question is - How can I handle this situation? I haven't found any
discussion about this topic. Or do I have to correct the parameters of my
model somehow? (now I am using 24 states for every words, 50 centroides).

I hope, I have explained my problem clearly.

Thanks in advance,
Peter

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-12-07

Hello Peter

That means something that hadn't been generated during training process is
then generated during recognition

Not sure how your "generated" terminology applies here. Did you have a chance
to read Rabiner's tutorial?

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.131.2084

Probably you'll find it useful

That means something that hadn't been generated during training process is
then generated during recognition (the word is somehow spoken another way.)
Well, the speaker is not a machine, so i think that this can happen. But when
this happens, the total probability of observing this sequency of symbols on
this model is equal to zero (computed with forward procedure).

Not sure why do you see a problem here. If your training set is not
sufficient, there is no wonder it doesn't recognize unseen states. You need
more training data probably.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2010-12-07

Hi , and thanks for reply

Not sure how your "generated" terminology applies here

Maybe I didn't write it right. I mean: the symbol, which has NOT been recorded
during training is then recorded during recognition. That means that for
example during the training process I record these symbol sequences (after
vector quantization of course):

sequence 1: 0 1 2 2 4 4 6 7 7 7 sequence 2: 0 2 2 1 4 4 6 4 4 4 sequence 3: 2 2 4 4 4 6 0 0 1 1

So after training I have some emission probabilities of symbols 0, 1, 2, 4, 6,
7 that are greater then 0. But symbols 3, 5, 8, 9 have emission probability in
all states equal 0 in this case.

Than during recognition I record this sequence of symbols:

0 1 2 2 3 4 4 7 0

And because of symbol 3 in these sequence the total probability of this
sequence on this model is equal zero. Even when this symbol was recorded only
one time(can be because of some noise during recording). So I am wondering if
there is some way to get non zero total probability even when such symbol is
in the sequence? Or is it only a matter of bigger training set (so it can
cover all the posibilities (or most of the posibilities) how the word can be
spoken)?

I have read the Rabiner tutorial, not once :) but haven't fount any note about
this problem.

Peter
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-12-07

And because of symbol 3 in these sequence the total probability of this
sequence on this model is equal zero. Even when this symbol was recorded only
one time(can be because of some noise during recording). So I am wondering if
there is some way to get non zero total probability even when such symbol is
in the sequence? Or is it only a matter of bigger training set (so it can
cover all the posibilities (or most of the posibilities) how the word can be
spoken)?

This issue covered in Rabiner, page 274, section D. Effects of Insufficient
Training Data. Solutions: add more data, use deleted interpolation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2010-12-08

Well, that could be very useful for me. Thank you very much. I must have had
some shorter version of this HMM tutorial without this section.

Especially the fourth solution for insufficient training data seems to be very
interresting. I will try to implement this contraints solution or deleted
interpolation technique.

**Thank you again for help. **
Have a nice day,
Peter

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

HMM - forward procedure problem

Speech Recognition Toolkit

Forums

Help

HMM - forward procedure problem

HMM - forward procedure problem

Speech Recognition Toolkit

Forums

Help

HMM - forward procedure problem document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

HMM - forward procedure problem