CMU Sphinx / Forums / Help: SphinxTrain run not creating mfc files.

Eduardo Naslausky - 2018-10-23

Hello
I've searched this forum and the internet for a similar problem, with no help.
Im running this on a ubuntu machine.

I've followed the tutorial steps for creating my own acoustic model.
I've reached the part where I run:
sudo sphinxtrain run
This command output is OK, however with a lot of errors like this:

WARNING: Error in '/home/naslausky/constituicao/etc/constituicao_train.fileids', the feature file '/home/naslausky/constituicao/feat/constituicao16k/dt095.mfc' does not exist, or is empty

First, I go to /home/naslausky/constituicao/logdir/000.comp_feat and check all the logs.
There are 2 logs: "constituicao.test-1-1.log" and "constituicao.train-1-1.log"
All logs are just a table of values, with the following sentences on the end:

INFO: sphinx_fe.c(967): Processing all remaining utterances at position 0
INFO: sphinx_fe.c(787): Converting /home/naslausky/constituicao/constituicao16k/art001a.wav to /home/naslausky/constituicao/feat/constituicao16k/art001a.mfc

Second, I go to /home/naslausky/constituicao/feat/constituicao16k, but there are NO files inside of it.

I run sphinx_fe command, and it is accepted, which means my sphinx_base has been installed sucessfully.
What may be the reason for the .mfc files arent being created?
I have attached my sphinx_train.cfg in this post.

My regards and sorry if this is a stupid question, but i just cant find the answer anywhere.
Naslausky

Last edit: Eduardo Naslausky 2018-10-23

sphinx_train.cfg

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-10-23
  
  Share the whole model training folder.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Eduardo Naslausky - 2018-10-23

I cant share the wav folders because the files are too big, but I can share a terminal session showing all the files and the zipped etc folder.
Here you go:
terminalSession.txt includes a copy of me showing the file names in terminal.
etc.zip is the etc folder without the languagemodel.lm.bin, which was also too big.

etc.zip

terminalSession.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-10-23
  
  You have to provide logs in logdir folder.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Eduardo Naslausky - 2018-10-23

Ok! Here you go:

logdir.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-10-23
  
  Based on your terminal output your files are not yet converted to 16khz, the file size of art001a.wav should be 8823714 bytes, your one has 1216056 just like original 22.1khz file.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - John Peter - 2018-11-20
    
    same problem but nothing in logdir, just date
    file: WAV,16kHz,mono
    kiny.html are results i get after running trainning , PLEASE HELP ME.
    
    Last edit: John Peter 2018-11-20
    
    etc.zip
    
    kiny.html
    
    logdir.zip
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Eduardo Naslausky - 2018-10-24

Ok! Thanks for all of your quick help.
I've made it through the training sucessfully.
I'm having a problem with the decoder, and the other solutions found on this forum regarding this issue does not match my problem.
The problem is this:

MODULE: DECODE Decoding using models previously trained
Aligning results to find error rate
word_align.pl failed with error code 65280 at /usr/local/lib/sphinxtrain/scripts/decode/slave.pl line 173.

I trained a Context dependent model, because I had a lot of data.
The configuration:
$DEC_CFG_MODEL_NAME = "$CFG_EXPTNAME.cd_${CFG_DIRLABEL}_${CFG_N_TIED_STATES}";
Configuration is correct: The folder /home/naslausky/constituicao/model_parameters/constituicao.cd_cont_4000
Exists and all the files are in there.
The command "which pocketsphinx_batch" returns:
/usr/local/bin/pocketsphinx_batch
Which shows the files are setup correctly.
I have attached the Logdir/decode folder.

Do you have any suggestion what may be causing the problem in the decoder?
Thanks and sorry for causing you so much trouble.

Last edit: Eduardo Naslausky 2018-10-24

decode.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-10-25
  
  Your constituicao_test.fileids file has 156 lines while constituicao_test.transcription files has much more lines.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Eduardo Naslausky - 2018-10-26

Thanks! That did the trick. For some reason the files were changed. Thanks for all the help.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SphinxTrain run not creating mfc files.

Speech Recognition Toolkit

Forums

Help

SphinxTrain run not creating mfc files. document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

SphinxTrain run not creating mfc files.