Menu

RASTA Processing

Raeldor
2011-02-19
2012-09-22
  • Raeldor

    Raeldor - 2011-02-19

    Hi All,

    I'm trying to write a 'kind of' speech recognizer, and it works great if I'm
    right up on the microphone during recognition phase, but if I move away the
    recognition gets gradually worse.

    I've tried CMN, but because the utterances are short (sometimes a single
    vowel) is seems to absolutely kill the recognition, and when I view a visual
    representation of the data it destroys the relationship between the MFCC
    values in a single frame.

    I have been reading about RASTA processing, but I can't find any easy-read
    documentation, it's all pretty heavy. Does RASTA work on the spectral
    information from the FFT of the original sample, or is it done on the MFCCs?
    Could this be a better approach? Does anyone have any source code or laymen's
    explanation of how to implement this?

    Thanks
    Ray

     
  • Nickolay V. Shmyrev

    I move away the recognition gets gradually worse.

    Just a model adaptation could work here. There are also channel compensation
    schemes for long-distance recognitin.

    I've tried CMN, but because the utterances are short (sometimes a single
    vowel) is seems to absolutely kill the recognition, and when I view a visual
    representation of the data

    You can try to share CMN values across utterances

    or is it done on the MFCCs?

    There are different types, basically the core idea is to apply a filter which
    can be done in various domains.

    Does anyone have any source code or laymen's explanation of how to implement
    this?

    RASTA sources can be found here:

    http://www.icsi.berkeley.edu/~dpwe/projects/sprach/sprachcore.html

     
  • Raeldor

    Raeldor - 2011-02-21

    Thank you for the quick reply. I'll take a look at the RASTA source code...
    code is always easier for me to understand than formulas. ;-)

     

Log in to post a comment.