The reconstructed spectrogram seems to have lost some information. If instead of MFCC, we used some other method that can more reliably reconstruct the spectrogram, will that be a better feature for recognition?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The question is if this additional information is useful for recognition or does it introduce just noise. The good features must not just keep the information but also remove unnecessary information:
1) Discriminate sounds
2) Be robust to additive noise
3) Be robust to channel noise
4) Be robust to delayed echo noise
5) Be robust to speaker
6) Be robust to speaking style
MFCC is good enough for 1) and to some degree to 3). Spectral subtraction in features can help with 2) partially. There is no good solution for 4). For 5) VTLN is effective. 6) is mostly unsolved problem.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
First see the original and reconstructed from MFCC spectrograms from http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/mfccs.html
The reconstructed spectrogram seems to have lost some information. If instead of MFCC, we used some other method that can more reliably reconstruct the spectrogram, will that be a better feature for recognition?
The question is if this additional information is useful for recognition or does it introduce just noise. The good features must not just keep the information but also remove unnecessary information:
1) Discriminate sounds
2) Be robust to additive noise
3) Be robust to channel noise
4) Be robust to delayed echo noise
5) Be robust to speaker
6) Be robust to speaking style
MFCC is good enough for 1) and to some degree to 3). Spectral subtraction in features can help with 2) partially. There is no good solution for 4). For 5) VTLN is effective. 6) is mostly unsolved problem.