Menu

Generel question about MFCC

Peterh123
2010-09-28
2012-09-22
  • Peterh123

    Peterh123 - 2010-09-28

    Hi all,
    I have a general question about the calculation of MFCCs.
    Therefore, I like to summarize my understanding and like you to judge if it is
    correct or not.

    For example audio samples are captured at 16kHz.

    1. Entire captured samples are pre-emphased based on a simple FIR to amplify high frequency parts of the signal.

    2. Audio stream is moved into buffers whereby, the first 160 samples (10ms) of each buffer are the last 160 samples of the previous buffer.

    3. Hamming window is performed on buffer

    4. Samples are transformed to frequency domain based on 512 point FFT. Due to symmetrical result, only 256 samples are needed for further processing. DC-offset can be discarded.

    5. Square of the absolute values is calculated of the FFT output.

    6. Mel-Filtering: E.g. 40 Mel filters are available. Each filter represents are band-pass for specific bandwidth and position. Because we are in frequency domain, filtering is done via multiplication. This leads to just one result for every filter. In total 40 values are calculated based on Mel-Filter-bank.

    7. Mel-filter result is compressed by natural logarithm.

    8. Values are transformed via discrete cosine transform, but only first 13 samples of the output are used for further steps.

    9. 1st and 2nd derivatives are calculated based on previous and “future” coefficients.

    10. MFCC consists on 39 values in total, whereby the last 26 values are the results of the derivatives.

    This (MFCC) coefficient represents a compressed and transformed version of the
    480 audio samples and it makes no sense to play it via digital to analogue
    converter.

    Is this correct?

    Thank you for your effort.

     
  • Nickolay V. Shmyrev

    This (MFCC) coefficient represents a compressed and transformed version of
    the 480 audio samples and it makes no sense > to play it via digital to
    analogue converter.

    You are correct about process. As for playing the cepstrum, you can convert it
    back to audio with MLSA filter, but the quality degrades.

    http://onlinelibrary.wiley.com/doi/10.1002/ecja.4400660203/abstract

     
  • Peterh123

    Peterh123 - 2010-09-28

    Thank you very much for the reply and the interesting link!

     

Log in to post a comment.