Balaji - 2019-02-05

Speech recognition theroy defines 25ms as the size of a frame and frame shift is 10ms. This means that every 10ms, I receive a frame of size 25 ms and subject it to the process of MFCC extraction to create .mfc files. But, in sphinx_fe.exe configuration, I found only the following parameter:
-frate 100 Frame rate

Also, in the tutorial, I found the following

for each frame, typically of 10 milliseconds length, we extract 39 numbers that represent the speech.

As per the tutorial, frate means "number of frames per second", which is 100 and each frame is of size 10ms.

I am confused as to which one is correct. Can someone clarify please... Thank you.

 

Last edit: Balaji 2019-02-05