Menu

MFCC extraction using sphinx_fe.exe

Help
Balaji
2020-09-16
2020-12-14
  • Balaji

    Balaji - 2020-09-16

    Hello ,

    I extracted MFCC features of a Wav files using Sphinx 4, converted and viewed in a text format.
    To learn the extraction method, I created a python script which does: framing, windowing, FFT, etc., as per MFCC tutorial (http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/ and https://www.kaggle.com/ilyamich/mfcc-implementation-and-tutorial)

    Following are the parameters I am using: 10 ms frames and 25.6ms is the window length.
    Now, my question is about the number of feature vectors generated:

    1. My first .wav file is of length = 1.9934375s = 1993.4375 ms - generated 104 feature vectors (13 coeffs in each row)
    2. My second .wav file is of length = 2.12125s = 2121.25 ms - generated 124 feature vectors (13 coeffs in each row)

    My doubt is if there are 100 frames per second and if each frame is converted into one 13-element feature vector, then in one second, there must be 100 vectors (each comprising 13 coeffs). But, the above examples are not aligning with this.
    Can you explain the arithmetic please. I am not sure if I missed to configure any parameter correctly.

    Thank you.

    Balaji.

     
    • Nickolay V. Shmyrev

      There is voice activity detection which removes frames, you can add -remove_silence no to see remaining ones.

       
  • Balaji

    Balaji - 2020-09-17

    Thank you. I got it.

     
  • Balaji

    Balaji - 2020-09-17

    How is the identification of silent frames done? Is there any threshold set ?
    I inspected the frames but, could not make out from the values.

     
  • Balaji

    Balaji - 2020-11-30

    Hello,

    I am performing MFCC extraction using sphinx_fe. I am separately performing each step of the feature extraction procedure and comparing the results with the output of sphinx_fe.

    I have questions on these parameters for feature extraction:
    1. frate: default is 100. This means the hop_length is 10 milliseconds, so the frames are generated at 0, 10, 20,... 990 th milliseconds, right? With -samprate = 16000, the hop_length is 160samples. Is this correct?
    2. In each frame, what is number of samples? - The parameter wlen = 0.025625. I interpret this as framesize=0.025625 seconds. That is, 25.625milliseconds = 410 samples (with 16KHz sampling rate). Is this correct?
    Or, is it nfft=512 parameter that defines framesize as 512 samples.
    3. lowerf =133.33334 and upperf=6855.4976. Why these values? Can we set it to 0 to 8000? (Nyquist freq for 16000 sampling rate)

    Thanks for your help.

     

    Last edit: Balaji 2020-11-30
  • Nickolay V. Shmyrev

    1. frate: default is 100. This means the hop_length is 10 milliseconds, so the frames are generated at 0, 10, 20,... 990 th milliseconds, right? With -samprate = 16000, the hop_length is 160samples. Is this correct?

    Yes

    1. In each frame, what is number of samples? - The parameter wlen = 0.025625. I interpret this as framesize=0.025625 seconds. That is, 25.625milliseconds = 410 samples (with 16KHz sampling rate). Is this correct?

    Yes

    Or, is it nfft=512 parameter that defines framesize as 512 samples.

    no

    1. lowerf =133.33334 and upperf=6855.4976. Why these values? Can we set it to 0 to 8000? (Nyquist freq for 16000 sampling rate)

    Most of the training audio doesn't include very high or very low frequency anyway, so it would be useless noise for recognition and training. Values show good results in experiments, though in modern ASR it is usually from 20 to 7600.

     
  • Balaji

    Balaji - 2020-12-14

    Thank you very much Nickolay.
    Any reason why these odd sizes (framesize: 410 samples and hop_length: 160 samples)?

    The examples I ran through had 2 power samples only. Like: framesize: 1024 and hop_length: 512, etc.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.