CMU Sphinx / Forums / Help: sphinxtrain, the feature file .mfc does not exist, or is empty

Orest - 2015-03-18

I am trying to train my own model with sphinxtrain using the tutorial (http://cmusphinx.sourceforge.net/wiki/tutorialam), the intent is to start by something that works initially (an example) and then apply opportune modifications, to familiarize with sphinxtrain. I I downloaded an4 database from http://www.speech.cs.cmu.edu/databases/an4/ (Raw audio (.raw) format, little endian byte order (64 M))

I unpacked, it, moved to the folder where the database is situated, (/home/osota/an4) , and then I run:

/opt/sphinxtrain/bin/./sphinxtrain -t an4 setup

it gives me this result:

Sphinxtrain path: /opt/sphinxtrain/lib/sphinxtrain Sphinxtrain binaries path: /opt/sphinxtrain/libexec/sphinxtrain Setting up the database an4

which from my understanding, shows that there are no issues arising from installation/configuration

I see that sphinx_train.cfg is created in /etc

I moved into etc/sphinx_train.cfg, and changed the following lines

$CFG_WAVFILE_EXTENSION = 'wav'; $CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw

into this:

$CFG_WAVFILE_EXTENSION = 'raw'; $CFG_WAVFILE_TYPE = 'raw'; # one of nist, mswav, raw

I did that because I noticed that the audio files have extension .raw

then I moved into /home/osota/an4 (the training database) and typed:

$ /opt/sphinxtrain/bin/./sphinxtrain run

Sphinxtrain path: /opt/sphinxtrain/lib/sphinxtrain Sphinxtrain binaries path: /opt/sphinxtrain/libexec/sphinxtrain Running the training MODULE: 000 Computing feature from audio files Extracting features from segments starting at (part 1 of 1) Extracting features from segments starting at (part 1 of 1) Feature extraction is done MODULE: 00 verify training files Phase 1: Checking to see if the dict and filler dict agrees with the phonelist file. Found 133 words using 34 phones Phase 2: Checking to make sure there are not duplicate entries in the dictionary Phase 3: Check general format for the fileids file; utterance length (must be positive); files exist WARNING: Error in '/home/osota/an4/etc/an4_train.fileids', the feature file '/home/osota/an4/feat/an4_clstk/fbbh/cen5-fbbh-b.mfc' does not exist, or is empty WARNING: Error in '/home/osota/an4/etc/an4_train.fileids', the feature file '/home/osota/an4/feat/an4_clstk/fbbh/cen6-fbbh-b.mfc' does not exist, or is empty WARNING: Error in '/home/osota/an4/etc/an4_train.fileids', the feature file '/home/osota/an4/feat/an4_clstk/mwhw/cen8-mwhw-b.mfc' does not exist, or is empty Phase 4: Checking number of lines in the transcript file should match lines in fileids file Phase 5: Determine amount of training data, see if n_tied_states seems reasonable. Estimated Total Hours Training: -0.000202564102564107 WARNING: Not enough data for the training Phase 6: Checking that all the words in the transcript are in the dictionary Words in dictionary: 130 Words in filler dictionary: 3 Phase 7: Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once

output stops at Phase 7

I put back $CFG_WAVFILE_EXTENSION = 'wav and $CFG_WAVFILE_TYPE = 'mswav', run again, encounter the same error

I do a search on this forum and in the troubleshooting section to try to find the solution, so I move to /home/osota/an4/logdir and I see 1 file: 000.comp_feat

I open it and I see:

" ============================================================================ " Netrw Directory Listing (netrw v145) " /home/osota/an4/logdir/000.comp_feat " Sorted by name " Sort sequence: [\/]$,\<core\%(\.\d\+\)\=\>,\.h$,\.c$,\.cpp$,\~\=\*$,*,\.o$,\.obj$,\.info$,\.swp$,\.bak$,\~$ " Quick Help: <F1>:help -:go up dir D:delete R:rename s:sort-by x:exec " ============================================================================ ../ an4.test-1-1.log an4.train-1-1.log .swp ~

then I see the documentation citing a similar issue to mine in the troubleshooting section:

WARNING: CTL file, audio file name.mfc, does not exist, or is empty. The .mfc files are the feature files converted from the input audio files on stage 000.comp_feats. Did you skip this step? Did you add new audio files without converting them? The training process expects a feature file to be there, and it isn't.

I'd like to familiarize and understand what's going on, because I've never trained a model before

"Did you skip this step?"
what step? the creation of mfc files? isn't sphinxtrain supposed to do this step? should I create those mfc files with sphinx_fe ?

"The training process expects a feature file to be there, and it isn't."

this makes me guess that the process of mfc files creation failed on step "MODULE: 000"

Another guess is that I might have some installation/configuration problem, my python version is Python 2.7.3, and I'm on Debian Linux

what Did I do wrong?

Last edit: Nickolay V. Shmyrev 2015-03-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-03-18
  
  You need to download NIST's Sphere audio (.sph) format (64 M). Extension must be sph, file type nist.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Orest - 2015-03-20
    
    Hi Nickolay, I downloaded "NIST's Sphere audio (.sph) format (64 M)" ,repeated the procedure (/opt/sphinxtrain/bin/./sphinxtrain -t an4 setup) in the new folder
    
    and set
    
    $CFG_WAVFILE_EXTENSION = 'sph'; $CFG_WAVFILE_TYPE = 'nist'; # one of nist, mswav, raw
    
    when I run, I encounter exactly the same errors, any idea of what it might be?
    
    I noticed that
    
    MODULE: 000 Computing feature from audio files Extracting features from segments starting at (part 1 of 1) Extracting features from segments starting at (part 1 of 1) Feature extraction is done
    
    takes 0 seconds to progress. From my understanding, this means that no feature is being extracted (and no mfc is being created), but on the other side I also see "Feature extraction is done" which implies that there were no errors
    
    if I go to the logdir I see 1 folder "000.comp_feat", If I open it I see 2 files: "an4.test-1-1.log" and "an4.train-1-1.log"
    
    If I open an4.train-1-1.log I see
    
    Fri Mar 20 14:54:25 2015 Fri Mar 20 14:54:25 2015
    
    (which is the time at the moment of steup or run)
    
    as an additional note, If I access the newly created folder feat/ I can see that the directory structure is created (the same directory structure as in wav), but in the case of feat/ there are no files contained in the folders
    
    EDIT 999: I tried again with the latest sphinxtrain from github, same errors
    
    any suggestions?
    
    Last edit: Orest 2015-03-20
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2015-03-20
      
      This log means you do not have sphinx_fe binary installed in your path or it fails to run. You need to make sure you properly installed sphinxbase, in particular, LD_LIBRARY_PATH part.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Conrad - 2016-03-04
        
        OH WOW. You are a savior good sir. Had the same problem and was confused for the longest time.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - ijazulhassan - 2020-08-04
        
        kindly plz contact me at ijazulhassan13@gmail.com, I need your help
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sri Ningsih - 2015-09-18

how about this warning

Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.
Estimated Total Hours Training: -0.000202564102564107
WARNING: Not enough data for the training

what should i do for this warning Mr. Nickolay, because my data training can't reach 1 hour?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

I have the following error kindly plz take me into a right way all things is going in right way except these errors!!!

Phase 3: Check general format for the fileids file; utterance length (must be positive); files exist
WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Female/ac_bnd.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Female/ac_chalo.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Female/batti_bujao.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Female/batti_jalao.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Female/jaxy.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Female/pankha_bnd.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Female/pankha_chalao.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male1/ac_bnd.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male1/ac_chalo.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male1/batti_bujao.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male1/batti_jalao.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male1/jaxy.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male1/pankha_bnd.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male1/pankha_chalao.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male2/ac_bnd.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male2/ac_chalo.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male2/batti_bujao.mfc' does not exist, or is empty

WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male2/batti_jalao.mfc' does not exist, or is empty
WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male2/jaxy.mfc' does not exist, or is empty
WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male2/pankha_bnd.mfc' does not exist, or is empty
WARNING: Error in 'C:/Sphinx/other/etc/My_train.fileids', the feature file 'C:/Sphinx/other/feat/Male2/pankha_chalao.mfc' does not exist, or is empty               
                                                                                                                                                                                                FAILED


Phase 4: Checking number of lines in the transcript file should match lines in fileids file
                                                                                                                                                                                                passed

Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.
Estimated Total Hours Training: -4.48717948717949e-006
This is a small amount of data, no comment at this time
                                                                                                                                                                                             WARNING

Last edit: ijazulhassan 2020-07-18

error.JPG

ijazulhassan - 2020-07-19

my speech db is below

Last edit: ijazulhassan 2020-07-20

My.html

My_speech_db.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

sphinxtrain, the feature file .mfc does not exist, or is empty

Speech Recognition Toolkit

Forums

Help

sphinxtrain, the feature file .mfc does not exist, or is empty document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

sphinxtrain, the feature file .mfc does not exist, or is empty