CMU Sphinx / Forums / Help: ERROR: "sphinx_fe.c", line 933: Failed to open control file

Imran Rajjad - 2017-01-26

I am using CGYWIN on Windows 10,I am trying to create an accoustic model. However the sphinxtrain script is unable to generate mfc files for the provided wav files. The error in the log files for both test and train is similar

ERROR: "sphinx_fe.c", line 933: Failed to open control file /cygdrive/d/nlp/accoustic_model/russ/etc/russ_test.fileids: No such file or directory

ERROR: "sphinx_fe.c", line 933: Failed to open control file /cygdrive/d/nlp/accoustic_model/russ/etc/russ_train.fileids: No such file or directory

It is unable to find the file containing list of wav files. The file name and path in the error are correct. Could CYGWIN be causing this?

Earlier tried installing python scripts on win10 but apparently there seems to be some compatability issues given the fact the I have installed Active perl and Active python as instrucuted by the tutorial page

regards,
Imran

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Arseniy Gorin - 2017-01-26
  
  If you say that the path is accessible from cgywin environment, there should be no problem in the script. I must say though that the most stable way to work from windows is to install a virtual machine with linux.
  
  Anyway, if you want I can try to reproduce your run. But you should provide the full training directory.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Imran Rajjad - 2017-01-27
    
    Sphinxtrain path: /cygdrive/d/nlp/sphinxtrain
    Sphinxtrain binaries path: /cygdrive/d/nlp/sphinxtrain/bin/Release/Win32
    Running the training
    MODULE: 000 Computing feature from audio files
    Extracting features from segments starting at (part 1 of 1)
    ERROR: This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
    Extracting features from segments starting at (part 1 of 1)
    ERROR: This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
    < I handled these errors by mannually generating mfc files using some other accoustic model>
    Feature extraction is done
    MODULE: 00 verify training files
    Phase 1: Checking to see if the dict and filler dict agrees with the phonelist file.
    WARNING: The phonelist (/cygdrive/d/nlp/accoustic_model/zuazu/etc/zuazu.phone) has duplicated phones
    Found 5661 words using 61 phones
    Phase 2: Checking to make sure there are not duplicate entries in the dictionary
    Phase 3: Check general format for the fileids file; utterance length (must be positive); files exist
    Phase 4: Checking number of lines in the transcript file should match lines in fileids file
    Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.
    Estimated Total Hours Training: 1.16042777777778
    This is a small amount of data, no comment at this time
    Phase 6: Checking that all the words in the transcript are in the dictionary
    Words in dictionary: 5656
    Words in filler dictionary: 5
    Phase 7: Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
    
    the sample database has been uploaded at
    https://drive.google.com/open?id=0ByCBQceVv2USOXhqdTZMM1NUcDg
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Imran Rajjad - 2017-01-27

the sample database has been uploaded at

https://drive.google.com/open?id=0ByCBQceVv2USOXhqdTZMM1NUcDg

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Arseniy Gorin - 2017-01-27
  
  Not sure what this mean
  
  < I handled these errors by mannually generating mfc files using some other accoustic model>
  
  You do not need a model for feature extraction.
  
  Anyway, you have a duplicate SIL in the zuazu.phone file. Just remove it, and then follow the training tutorial (you should also specify the LM name without DMP at the end, otherwise the decoder does not find it)
  
  Aligning results to find error rate SENTENCE ERROR: 37.1% (263/708) WORD ERROR RATE: 4.0% (579/14662)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Imran Rajjad - 2017-01-27
    
    thanks a lot for the help. Yes that a dumb mistake, did not see that SIL.
    
    About the DMP, I beleive the lm file name is zuazu.lm and I did not end the file name with DMP. Is there something else I need to know as a rookie?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Arseniy Gorin - 2017-01-27
      
      I mean by default config file will probably have DMP at the end. Yes, you change it with just lm
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Arseniy Gorin - 2017-01-27

here is the model and the example config file

zuazu.7z

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ERROR: "sphinx_fe.c", line 933: Failed to open control file

Speech Recognition Toolkit

Forums

Help

ERROR: "sphinx_fe.c", line 933: Failed to open control file document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

ERROR: "sphinx_fe.c", line 933: Failed to open control file