Alternative to CMS for Short Utterances

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Alternative to CMS for Short Utterances

Forum: Speech Recognition Theory

Creator: Raeldor

Created: 2011-01-28

Updated: 2012-09-22

Raeldor - 2011-01-28

Hi All,

Does anyone know if there is an alternative to CMS (or CMN) for short
utterances. I have been doing some testing, and it seems to work great for
sections of 2+ seconds in length, but for shorter 0.5-1 second utterances it
really distorts the signal a lot.

Thanks
Ray

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Raeldor - 2011-01-30

Actually, I'm starting to wonder now if I'm doing my CMN correctly. I have the
mean calculated individually for each co-efficient over time, is that correct?
So, my (pseudo) code reads...

loop through each mfcc (1-12)
mean = 0
loop through each frame
mean = mean + {mfcc value for frame}
end loop
mean = mean / {frame count}
loop through each frame
{mfcc value for frame} = {mfccvalue for frame} - mean
end loop
end loop

should it really be...

mean=0
loop through each mfcc (1-12)
loop through each frame
mean = mean + {mfcc value for frame}
end loop
end loop
loop through each mfcc (1-12)
loop through each frame
{mfcc value for frame} = {mfcc value for frame} - mean
end loop
end loop

Otherwise it doesn't seem to make sense that i'm losing the relative strength
of each mfcc value to another within the same frame.

Thanks
Ray

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-01-30

So, my (pseudo) code reads...

This is correct

should it really be...

No

You'll be interested to read the following papers:

Reducing The Effects Of Linear Channel Distortion On Continuous Speech
Recognition (1996)
by Rebecca Anne Bates , Dr. Mari Ostendorf , Associate Professor , Dr. J.
Robin Rohlicek , Vice President , Dr. William , C. Karl , Assistant
Professor

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.9450

and

The Use Of Cepstral Means In Conversational Speech Recognition (1997)
by Martin Westphal

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.2342

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Raeldor - 2011-01-30

Thank you very much for your reply! So here's my problem. If I am working on a
simple utterance of the word 'ah!', it's very heavy in MFCC (a value of around
28 in all frames, but quite low (between -10 to +2) in all other co-efficients
in all other frames. If I use this technique on each MFCC individually, I lose
the information about the relative strength of each MFCC to each other.

Is this a common problem, or am I going crazy? Is there another technique for
using on this kind of short, single vowel phrase?

Thank you!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-01-30

If I use this technique on each MFCC individually, I lose the information
about the relative strength of each MFCC to each other.

This is not a problem. The problem is to estimate cepstrum mean for a
speaker/channel using just a short sample. Please read the papers first.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Raeldor - 2011-01-30

Aaahh... now I see the quote in the second paper. So by that design, couldn't
I keep a running average of the cepstral means for the last 'n' utterances
until 'n' reaches a point of diminishing returns?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-01-31

So by that design, couldn't I keep a running average of the cepstral means
for the last 'n' utterances until 'n' reaches a point of diminishing returns?

This is just one of the methods, it's better than default though.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.