Importance of reconstuctability of spectrogram

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Importance of reconstuctability of spectrogram

Forum: Speech Recognition Theory

Creator: dovark

Created: 2014-04-04

Updated: 2014-04-04

dovark - 2014-04-04

First see the original and reconstructed from MFCC spectrograms from http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/mfccs.html

The reconstructed spectrogram seems to have lost some information. If instead of MFCC, we used some other method that can more reliably reconstruct the spectrogram, will that be a better feature for recognition?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-04-04

The question is if this additional information is useful for recognition or does it introduce just noise. The good features must not just keep the information but also remove unnecessary information:

1) Discriminate sounds
2) Be robust to additive noise
3) Be robust to channel noise
4) Be robust to delayed echo noise
5) Be robust to speaker
6) Be robust to speaking style

MFCC is good enough for 1) and to some degree to 3). Spectral subtraction in features can help with 2) partially. There is no good solution for 4). For 5) VTLN is effective. 6) is mostly unsolved problem.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.