Menu

Alternative to CMS for Short Utterances

Raeldor
2011-01-28
2012-09-22
  • Raeldor

    Raeldor - 2011-01-28

    Hi All,

    Does anyone know if there is an alternative to CMS (or CMN) for short
    utterances. I have been doing some testing, and it seems to work great for
    sections of 2+ seconds in length, but for shorter 0.5-1 second utterances it
    really distorts the signal a lot.

    Thanks
    Ray

     
  • Raeldor

    Raeldor - 2011-01-30

    Actually, I'm starting to wonder now if I'm doing my CMN correctly. I have the
    mean calculated individually for each co-efficient over time, is that correct?
    So, my (pseudo) code reads...

    loop through each mfcc (1-12)
    mean = 0
    loop through each frame
    mean = mean + {mfcc value for frame}
    end loop
    mean = mean / {frame count}
    loop through each frame
    {mfcc value for frame} = {mfccvalue for frame} - mean
    end loop
    end loop

    should it really be...

    mean=0
    loop through each mfcc (1-12)
    loop through each frame
    mean = mean + {mfcc value for frame}
    end loop
    end loop
    loop through each mfcc (1-12)
    loop through each frame
    {mfcc value for frame} = {mfcc value for frame} - mean
    end loop
    end loop

    Otherwise it doesn't seem to make sense that i'm losing the relative strength
    of each mfcc value to another within the same frame.

    Thanks
    Ray

     
  • Nickolay V. Shmyrev

    So, my (pseudo) code reads...

    This is correct

    should it really be...

    No

    You'll be interested to read the following papers:

    Reducing The Effects Of Linear Channel Distortion On Continuous Speech
    Recognition (1996)
    by Rebecca Anne Bates ,  Dr. Mari Ostendorf ,  Associate Professor ,  Dr. J.
    Robin Rohlicek ,  Vice President ,  Dr. William ,  C. Karl ,  Assistant
    Professor

    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.9450

    and

    The Use Of Cepstral Means In Conversational Speech Recognition (1997)
    by Martin Westphal

    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.2342

     
  • Raeldor

    Raeldor - 2011-01-30

    Thank you very much for your reply! So here's my problem. If I am working on a
    simple utterance of the word 'ah!', it's very heavy in MFCC (a value of around
    28 in all frames, but quite low (between -10 to +2) in all other co-efficients
    in all other frames. If I use this technique on each MFCC individually, I lose
    the information about the relative strength of each MFCC to each other.

    Is this a common problem, or am I going crazy? Is there another technique for
    using on this kind of short, single vowel phrase?

    Thank you!

     
  • Nickolay V. Shmyrev

    If I use this technique on each MFCC individually, I lose the information
    about the relative strength of each MFCC to each other.

    This is not a problem. The problem is to estimate cepstrum mean for a
    speaker/channel using just a short sample. Please read the papers first.

     
  • Raeldor

    Raeldor - 2011-01-30

    Aaahh... now I see the quote in the second paper. So by that design, couldn't
    I keep a running average of the cepstral means for the last 'n' utterances
    until 'n' reaches a point of diminishing returns?

     
  • Nickolay V. Shmyrev

    So by that design, couldn't I keep a running average of the cepstral means
    for the last 'n' utterances until 'n' reaches a point of diminishing returns?

    This is just one of the methods, it's better than default though.

     

Log in to post a comment.