CMU Sphinx / Forums / Help: Arabic Sphinx Train Error at phase 3: issue with creating feature files

us - 2017-04-25

Hello.

I am building an Arabic corpus, something small - only 8 utterences with 29 words. I have got 1 hour and 19 minutes of audio from 50 speakers. I have entitiled 40/50 files (1hr and 4 mins) to the training and the remaining 10 files (15mins) for the testing. Each of the 50 speakers has 8 audio files, one for each utterence and all files are formatted to MSWav 16-bit Mono @ 16kHz.
I have followed the the tutorial carefully (http://cmusphinx.sourceforge.net/wiki/tutorialam).
Latest versions of all packages (SphinxTrain, SphinxBase and PocketSphinx) have been downloaded. I used SRILM to generate my languge model in ARPA format. All installation and configuration all seems fine.
Most of my errors were due to me building the model with arabic text including the phonemes, dictionary and transcript. The text was all formated to UTF 8 --NoBOM which all seemed to work fine. After moving a few bits around, it cleared most errors during training in phase 1,2 and 4.

This is all based on 2 errors during training after the "sphinxtrain run" command is called.
ERRORS:
1) In phase 1 there is 2 phonemes that occur in the dictionary but not in the phonelist. Also 3 phonemes that occur in the phonelist but not the dictionary.
2) WARNING: Error in '/Users/...DB_train.fileids', the feature file '/Users...DB/feat/speaker_2/2-001-001.mfc' does not exist, or is empty

The error log doesn't help much as is shows no errors. It just stops after running the sphinx_fe.c(787): Converting to mfc for the first speaker. once it get to the first file in speaker_2 directory, the log just stops.
Funny thing is that with the second error is only begins a speaker 2. Speaker 1 has all it's mfc files generated in the feat directory when the command is run. I check the file names and paths in the .fileids as well as the database. From one of the previous forums that spoke about the same error mentioned it has generated some of the .mfc files, where he mentioned that one of the file names were wrong. However this is not the case with mine, I have no idea what it is. My audio is in WAV format so it can't be the $CFG_WAVFILE_EXTENSION or $CFG_WAVFILE_TYPE.

Also with the first error for the phonemes, I have no idea what to do because I have check and verified that they are all present in the phonelist and dictionary. I even tried copying and pasting them from one another to make sure the Arabic leters dont have and extra spaces after their diacritic marks. Their format in the dictionary is:
WORD [tab] phone [single space] phone [single space]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

us - 2017-04-25

Here are the logs, features and training files. Excludes the wav directory due to size. Here is a screen shot of the wav directory layout for refrence...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Due to the log file showing the process stops at the second speaker during feature .mfc process with the sphinx_fc.c, I tried running the process seperatly using the command below...

sphinx_fe -c etc/quranDB_train.fileids -di wav -ei wav -do feat -eo .mfc

This returned the following...

INFO: sphinx_fe.c(967): Processing all remaining utterances at position 0
INFO: sphinx_fe.c(787): Converting wav/speaker_1/1-001-001.wav to feat/speaker_1/1-001-001.mfc
INFO: sphinx_fe.c(787): Converting wav/speaker_1/1-001-002.wav to feat/speaker_1/1-001-002.mfc
INFO: sphinx_fe.c(787): Converting wav/speaker_1/1-001-003.wav to feat/speaker_1/1-001-003.mfc
INFO: sphinx_fe.c(787): Converting wav/speaker_1/1-001-004.wav to feat/speaker_1/1-001-004.mfc
INFO: sphinx_fe.c(787): Converting wav/speaker_1/1-001-005.wav to feat/speaker_1/1-001-005.mfc
INFO: sphinx_fe.c(787): Converting wav/speaker_1/1-001-006.wav to feat/speaker_1/1-001-006.mfc
INFO: sphinx_fe.c(787): Converting wav/speaker_1/1-001-007.wav to feat/speaker_1/1-001-007.mfc
INFO: sphinx_fe.c(787): Converting wav/speaker_1/1-001-008.wav to feat/speaker_1/1-001-008.mfc
INFO: sphinx_fe.c(787): Converting wav/speaker_2/2-001-001.wav to feat/speaker_2/2-001-001.mfc
Segmentation fault: 11

What is Segmentation fault 11?

us - 2017-04-26

Also I forgot to mention that when I was installing the tools I forced it with "sudo make install". This applies to sphinxtrain, sphinxbase and pocketsphinx. I was getting errors initally for the sphinx_fe.c file and it not being found. After adding the following to my ~/.bash_profile and ~/.bashrc this issue was resolved...

export PATH=/usr/local/bin:$PATH export LD_LIBRARY_PATH=/usr/local/lib export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

This got the sphinx_fe running.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

us - 2017-04-29

Please can someone help? It is urgent!!!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

us - 2017-05-11

!!!!! RESOLVED !!!!!

The reason for interrupted training and no error message during feature extraction phase was due to incorrect file audio file format. Some of the files were set to 44100Hz instead of 16000Hz.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-05-11
  
  Congratulations. Follow the tutorial strictly next time.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kedar Ganesh Nadkarni - 2017-07-19

can you please help me i have been getting the same error!

" sphinx_fe -argfile en-us/feat.params \
-samprate 16000 -c arctic20.fileids \
-di . -do . -ei wav -eo mfc -mswav yes "

I have used the above command and I have even used the same .wav files provided in the CMU tutorial for conversion of the audio files to MFC

Can someone please help me since i have tried many things but I am getting the same error as mentioned below:

" INFO: sphinx_fe.c(791): Converting ./arctic_0009.wav to ./arctic_0009.mfc
ERROR: "sphinx_fe.c", line 119: Failed to open ./arctic_0009.wav: No such file or directory "
for all the .wav files

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-07-23
  
  Run this command from the folder containing wav files and learn a bit more about file paths.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Arabic Sphinx Train Error at phase 3: issue with creating feature files

Speech Recognition Toolkit

Forums

Help

Arabic Sphinx Train Error at phase 3: issue with creating feature files document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Arabic Sphinx Train Error at phase 3: issue with creating feature files