Menu

CMU Sphinxtrain feat files details

Help
2015-08-31
2015-09-02
  • Azhar Abdulaziz

    Azhar Abdulaziz - 2015-08-31

    Hi everyone,
    I am trying to train my own features (other than mfcc) using CMU Sphinxtrain. Using an4 sph audio, I bypassed the comp_feat stage and start from verify one. This approach lead to too much errors therefore I have built my own mfcc again and plug it starting from verify stage, to check the format of the feat file as there is no clear documents about how the features are stored.
    The best results I got so far are :SER 90.8% and WER 58%. While if I use the features computed by sphinxtrain comp_feat, SER was 46.3% and WER was 15.5%.
    Besides, with my own MATLAB mfcc feature files, I had errors in Phase 3, Forward-Backward stage in MODULE 20.
    while there were no errors when sphinxtrain compute the mfcc features.
    My mfcc files (computed in MATLAB) have different values, and even worse, diffrent size although I am using the same configurations. The sphinx features are bigger in size than my MATLAB MFCC for 13-coeffecients.

    To make things clear, I will be grateful if someone can answer the following questions about the feat files:

    • HOW ARE FEATURES STORED IN FEAT FILES?
    • HOWMANY COEFFICIENTS ARE STORED ? ( I know it is choosen in sphinx_train.cfg to be 13)
    • When the audio is read, should it be scaled to +-1?(I know it shouldn't matter as long as the same scaling is done for both train and test !)

    P.S.
    - The things that I found documented are found in http://cmusphinx.sourceforge.net/wiki/mfcformat

    • Also, Nickolay V. Shmyrev in 2008-03-05, said that it is stored in a big_endian byte order.(see
      http://sourceforge.net/p/cmusphinx/discussion/speech-recognition/thread/9d48ac3e/ ). For me, It has no effect on the results.

    • The attached file is the cofiguration for my experiments, note that I will not start training from comp_feat, I will train my accoustic model using the following command:
      sphinxtrain -f verify an4 run.

    Thank you

     

    Last edit: Azhar Abdulaziz 2015-08-31
  • Azhar Abdulaziz

    Azhar Abdulaziz - 2015-09-02

    I found that the missing point in the configuration is the overlap between each two successive frames. I thought the default is 50%, what I discovered now it is 40%. This comes from my experiments to know how the mfc binary files are stored!!!!

    This parameter affect the mfc file size and definitely affects the recognition.

     

    Last edit: Azhar Abdulaziz 2015-09-02

Log in to post a comment.