Menu

sphinxtrain, the feature file .mfc does not exist, or is empty

Help
Orest
2015-03-18
2020-08-04
  • Orest

    Orest - 2015-03-18

    I am trying to train my own model with sphinxtrain using the tutorial (http://cmusphinx.sourceforge.net/wiki/tutorialam), the intent is to start by something that works initially (an example) and then apply opportune modifications, to familiarize with sphinxtrain. I I downloaded an4 database from http://www.speech.cs.cmu.edu/databases/an4/ (Raw audio (.raw) format, little endian byte order (64 M))

    I unpacked, it, moved to the folder where the database is situated, (/home/osota/an4) , and then I run:

    /opt/sphinxtrain/bin/./sphinxtrain -t an4 setup
    

    it gives me this result:

    Sphinxtrain path: /opt/sphinxtrain/lib/sphinxtrain
    Sphinxtrain binaries path: /opt/sphinxtrain/libexec/sphinxtrain
    Setting up the database an4
    

    which from my understanding, shows that there are no issues arising from installation/configuration

    I see that sphinx_train.cfg is created in /etc

    I moved into etc/sphinx_train.cfg, and changed the following lines

    $CFG_WAVFILE_EXTENSION = 'wav';
    $CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw
    

    into this:

    $CFG_WAVFILE_EXTENSION = 'raw';
    $CFG_WAVFILE_TYPE = 'raw'; # one of nist, mswav, raw
    

    I did that because I noticed that the audio files have extension .raw

    then I moved into /home/osota/an4 (the training database) and typed:

    $ /opt/sphinxtrain/bin/./sphinxtrain run
    
    Sphinxtrain path: /opt/sphinxtrain/lib/sphinxtrain
    Sphinxtrain binaries path: /opt/sphinxtrain/libexec/sphinxtrain
    Running the training
    MODULE: 000 Computing feature from audio files
    Extracting features from  segments starting at  (part 1 of 1) 
    Extracting features from  segments starting at  (part 1 of 1) 
    Feature extraction is done
    MODULE: 00 verify training files
        Phase 1: Checking to see if the dict and filler dict agrees with the phonelist file.
            Found 133 words using 34 phones
        Phase 2: Checking to make sure there are not duplicate entries in the dictionary
        Phase 3: Check general format for the fileids file; utterance length (must be positive); files exist
    WARNING: Error in '/home/osota/an4/etc/an4_train.fileids', the feature file '/home/osota/an4/feat/an4_clstk/fbbh/cen5-fbbh-b.mfc' does not exist, or is empty
    WARNING: Error in '/home/osota/an4/etc/an4_train.fileids', the feature file '/home/osota/an4/feat/an4_clstk/fbbh/cen6-fbbh-b.mfc' does not exist, or is empty
    
    WARNING: Error in '/home/osota/an4/etc/an4_train.fileids', the feature file '/home/osota/an4/feat/an4_clstk/mwhw/cen8-mwhw-b.mfc' does not exist, or is empty
        Phase 4: Checking number of lines in the transcript file should match lines in fileids file
        Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.
            Estimated Total Hours Training: -0.000202564102564107
    WARNING: Not enough data for the training
        Phase 6: Checking that all the words in the transcript are in the dictionary
            Words in dictionary: 130
            Words in filler dictionary: 3
        Phase 7: Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
    

    output stops at Phase 7

    I put back $CFG_WAVFILE_EXTENSION = 'wav and $CFG_WAVFILE_TYPE = 'mswav', run again, encounter the same error

    I do a search on this forum and in the troubleshooting section to try to find the solution, so I move to /home/osota/an4/logdir and I see 1 file: 000.comp_feat

    I open it and I see:

    " ============================================================================
    " Netrw Directory Listing                                        (netrw v145)
    "   /home/osota/an4/logdir/000.comp_feat
    "   Sorted by      name
    "   Sort sequence: [\/]$,\<core\%(\.\d\+\)\=\>,\.h$,\.c$,\.cpp$,\~\=\*$,*,\.o$,\.obj$,\.info$,\.swp$,\.bak$,\~$
    "   Quick Help: <F1>:help  -:go up dir  D:delete  R:rename  s:sort-by  x:exec
    " ============================================================================
    ../
    an4.test-1-1.log
    an4.train-1-1.log
    .swp
    ~       
    

    then I see the documentation citing a similar issue to mine in the troubleshooting section:

    WARNING: CTL file, audio file name.mfc, does not exist, or is empty.
    
    The .mfc files are the feature files converted from the input audio files on stage 000.comp_feats. Did you skip this step? Did you add new audio files without converting them? The training process expects a feature file to be there, and it isn't.
    

    I'd like to familiarize and understand what's going on, because I've never trained a model before

    "Did you skip this step?"
    what step? the creation of mfc files? isn't sphinxtrain supposed to do this step? should I create those mfc files with sphinx_fe ?

    "The training process expects a feature file to be there, and it isn't."

    this makes me guess that the process of mfc files creation failed on step "MODULE: 000"

    Another guess is that I might have some installation/configuration problem, my python version is Python 2.7.3, and I'm on Debian Linux

    what Did I do wrong?

     

    Last edit: Nickolay V. Shmyrev 2015-03-18
    • Nickolay V. Shmyrev

      You need to download NIST's Sphere audio (.sph) format (64 M). Extension must be sph, file type nist.

       
      • Orest

        Orest - 2015-03-20

        Hi Nickolay, I downloaded "NIST's Sphere audio (.sph) format (64 M)" ,repeated the procedure (/opt/sphinxtrain/bin/./sphinxtrain -t an4 setup) in the new folder

        and set

        $CFG_WAVFILE_EXTENSION = 'sph';
        $CFG_WAVFILE_TYPE = 'nist'; # one of nist, mswav, raw
        

        when I run, I encounter exactly the same errors, any idea of what it might be?

        I noticed that

        MODULE: 000 Computing feature from audio files
        Extracting features from  segments starting at  (part 1 of 1) 
        Extracting features from  segments starting at  (part 1 of 1) 
        Feature extraction is done
        

        takes 0 seconds to progress. From my understanding, this means that no feature is being extracted (and no mfc is being created), but on the other side I also see "Feature extraction is done" which implies that there were no errors

        if I go to the logdir I see 1 folder "000.comp_feat", If I open it I see 2 files: "an4.test-1-1.log" and "an4.train-1-1.log"

        If I open an4.train-1-1.log I see

        Fri Mar 20 14:54:25 2015
        Fri Mar 20 14:54:25 2015
        

        (which is the time at the moment of steup or run)

        as an additional note, If I access the newly created folder feat/ I can see that the directory structure is created (the same directory structure as in wav), but in the case of feat/ there are no files contained in the folders

        EDIT 999: I tried again with the latest sphinxtrain from github, same errors

        any suggestions?

         

        Last edit: Orest 2015-03-20
        • Nickolay V. Shmyrev

          This log means you do not have sphinx_fe binary installed in your path or it fails to run. You need to make sure you properly installed sphinxbase, in particular, LD_LIBRARY_PATH part.

           
          • Conrad

            Conrad - 2016-03-04

            OH WOW. You are a savior good sir. Had the same problem and was confused for the longest time.

             
          • ijazulhassan

            ijazulhassan - 2020-08-04

            kindly plz contact me at ijazulhassan13@gmail.com, I need your help

             
  • Sri Ningsih

    Sri Ningsih - 2015-09-18

    how about this warning

    Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.
    Estimated Total Hours Training: -0.000202564102564107
    WARNING: Not enough data for the training

    what should i do for this warning Mr. Nickolay, because my data training can't reach 1 hour?

     
  • ijazulhassan

    ijazulhassan - 2020-07-18

    I have the following error kindly plz take me into a right way all things is going in right way except these errors!!!

    Phase 3: Check general format for the fileids file; utterance length (must be positive); files exist
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Female/ac_bnd.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Female/ac_chalo.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Female/batti_bujao.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Female/batti_jalao.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Female/jaxy.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Female/pankha_bnd.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Female/pankha_chalao.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male1/ac_bnd.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male1/ac_chalo.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male1/batti_bujao.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male1/batti_jalao.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male1/jaxy.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male1/pankha_bnd.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male1/pankha_chalao.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male2/ac_bnd.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male2/ac_chalo.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male2/batti_bujao.mfc' does not exist, or is empty
    
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male2/batti_jalao.mfc' does not exist, or is empty
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male2/jaxy.mfc' does not exist, or is empty
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male2/pankha_bnd.mfc' does not exist, or is empty
    WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male2/pankha_chalao.mfc' does not exist, or is empty               
                                                                                                                                                                                                    FAILED
    
    
    Phase 4: Checking number of lines in the transcript file should match lines in fileids file
                                                                                                                                                                                                    passed
    
    Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.
    Estimated Total Hours Training: -4.48717948717949e-006
    This is a small amount of data, no comment at this time
                                                                                                                                                                                                 WARNING
    
     

    Last edit: ijazulhassan 2020-07-18
  • ijazulhassan

    ijazulhassan - 2020-07-19

    my speech db is below

     

    Last edit: ijazulhassan 2020-07-20

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.