CMU Sphinx / Forums / Help: Acoustic Model with Forced Alignment Failing

Izza - 2016-06-11

Hi,

I'm trying to train an acousting model with forced alignment. However, the training fails with error [1]. Think for some reason force aligning is failing. Any help on what is going wrong here is very much apprecicated!

I have attached the log directory as well.

[1].
MODULE: 000 Computing feature from audio files
Extracting features from segments starting at (part 1 of 1)
Extracting features from segments starting at (part 1 of 1)
Feature extraction is done
MODULE: 00 verify training files
Phase 1: Checking to see if the dict and filler dict agrees with the phonelist file.
Found 2652 words using 202 phones
Phase 2: Checking to make sure there are not duplicate entries in the dictionary
Phase 3: Check general format for the fileids file; utterance length (must be positive); files exist
Phase 4: Checking number of lines in the transcript file should match lines in fileids file
Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.
Estimated Total Hours Training: 0.980125
This is a small amount of data, no comment at this time
Phase 6: Checking that all the words in the transcript are in the dictionary
Words in dictionary: 2649
Words in filler dictionary: 3
Phase 7: Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
MODULE: 0000 train grapheme-to-phoneme model
Skipped (set $CFG_G2P_MODEL = 'yes' to enable)
MODULE: 01 Train LDA transformation
Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
MODULE: 02 Train MLLT transformation
Skipped (set $CFG_LDA_MLLT = 'yes' to enable)
MODULE: 05 Vector Quantization
Skipped for continuous models
MODULE: 10 Training Context Independent models for forced alignment and VTLN
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...models...
Phase 2: Flat initialize
Phase 3: Forward-Backward
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 1
Current Overall Likelihood Per Frame = -172.660686703793
Baum welch starting for 1 Gaussian(s), iteration: 2 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 2
Current Overall Likelihood Per Frame = -171.773356572999
Convergence Ratio = 0.887330130793515
Baum welch starting for 1 Gaussian(s), iteration: 3 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 3
Current Overall Likelihood Per Frame = -168.973713670308
Convergence Ratio = 2.79964290269052
Baum welch starting for 1 Gaussian(s), iteration: 4 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 4
Current Overall Likelihood Per Frame = -167.020164661537
Convergence Ratio = 1.95354900877106
Baum welch starting for 1 Gaussian(s), iteration: 5 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
ERROR: This step had 2 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 5
Current Overall Likelihood Per Frame = -166.388843837051
Convergence Ratio = 0.631320824486181
Baum welch starting for 1 Gaussian(s), iteration: 6 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
ERROR: This step had 4 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 6
Current Overall Likelihood Per Frame = -166.173955024339
Convergence Ratio = 0.214888812712047
Baum welch starting for 1 Gaussian(s), iteration: 7 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
ERROR: This step had 4 ERROR messages and 0 WARNING messages. Please check the log file for details.
Normalization for iteration: 7
Current Overall Likelihood Per Frame = -166.080741367096
Training completed after 7 iterations
MODULE: 11 Force-aligning transcripts
Phase 1: Cleaning up directories:
logs...output...qmanager...
Phase 3: Creating dictionary for alignment...
Phase 4: Creating transcript for alignment...
Phase 5: Running force alignment in 1 parts
Force alignment starting: (1 of 1)
0%
ERROR: This step had 25 ERROR messages and 0 WARNING messages. Please check the log file for details.
Failed in part 1
MODULE: 12 Force-aligning data for VTLN
Skipped: $ST::CFG_VTLN set to 'no' in sphinx_train.cfg
MODULE: 20 Training Context Independent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...models...
Phase 2: Copy initialize from falign model
Phase 3: Forward-Backward
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
0%
ERROR: This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
ERROR: Training failed in iteration 1
Sphinxtrain path: /usr/local/lib/sphinxtrain
Sphinxtrain binaries path: /usr/local/libexec/sphinxtrain
Running the training

Last edit: Izza 2016-06-11

logdir_force_alignment.zip

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-06-11
  
  Forced alignment failed on utterance 207, you need to review and probably remove that utterance from training.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Izza - 2016-06-12
    
    Hi Nickolay,
    
    Thank you for the input.
    
    After removing utterance 207, training progressed a bit more but failed again with errors [1, 2]. Doing a grep for ERROR in logdir/10.falign_ci_hmm revealed errors [3] and doing the same in logdir/11.force_align revealed errors [4]. Is this due to an issue in training wav files? What should I do to find the actual issue? Can the last error "ERROR: FATAL: "main.c", line 167: Unable to open /home/isuru/mine/work/sinhala_speech_to_text/training/trees/xxxx.unpruned/WAA-0.dtree for reading: No such file or directory" be a result of such previous issues?
    
    Have attached the zipped logdir.
    
    Many thanks!
    
    [1].
    MODULE: 10 Training Context Independent models for forced alignment and VTLN
    Phase 1: Cleaning up directories:
    accumulator...logs...qmanager...models...
    Phase 2: Flat initialize
    Phase 3: Forward-Backward
    Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
    0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
    Normalization for iteration: 1
    Current Overall Likelihood Per Frame = -172.663171705615
    Baum welch starting for 1 Gaussian(s), iteration: 2 (1 of 1)
    0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
    Normalization for iteration: 2
    Current Overall Likelihood Per Frame = -171.776481872024
    Convergence Ratio = 0.886689833590935
    Baum welch starting for 1 Gaussian(s), iteration: 3 (1 of 1)
    0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
    Normalization for iteration: 3
    Current Overall Likelihood Per Frame = -168.97396709386
    Convergence Ratio = 2.80251477816381
    Baum welch starting for 1 Gaussian(s), iteration: 4 (1 of 1)
    0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
    Normalization for iteration: 4
    Current Overall Likelihood Per Frame = -167.006571551359
    Convergence Ratio = 1.96739554250144
    Baum welch starting for 1 Gaussian(s), iteration: 5 (1 of 1)
    0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
    Normalization for iteration: 5
    Current Overall Likelihood Per Frame = -166.373607921153
    Convergence Ratio = 0.632963630206234
    Baum welch starting for 1 Gaussian(s), iteration: 6 (1 of 1)
    0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
    ERROR: This step had 4 ERROR messages and 0 WARNING messages. Please check the log file for details.
    Normalization for iteration: 6
    Current Overall Likelihood Per Frame = -166.166409428127
    Convergence Ratio = 0.207198493025828
    Baum welch starting for 1 Gaussian(s), iteration: 7 (1 of 1)
    0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
    ERROR: This step had 6 ERROR messages and 0 WARNING messages. Please check the log file for details.
    Normalization for iteration: 7
    Current Overall Likelihood Per Frame = -166.088183787271
    Training completed after 7 iterations
    MODULE: 11 Force-aligning transcripts
    Phase 1: Cleaning up directories:
    logs...output...qmanager...
    Phase 3: Creating dictionary for alignment...
    Phase 4: Creating transcript for alignment...
    Phase 5: Running force alignment in 1 parts
    Force alignment starting: (1 of 1)
    0%
    ERROR: This step had 30 ERROR messages and 0 WARNING messages. Please check the log file for details.
    
    [2].
    MODULE: 45 Prune Trees
    Phase 1: Tree Pruning
    ERROR: FATAL: "main.c", line 167: Unable to open /home/isuru/mine/work/xxx/training/trees/xxx.unpruned/WAA-0.dtree for reading: No such file or directory
    MODULE: 50 Training Context dependent models
    Phase 1: Cleaning up directories:
    accumulator...logs...qmanager...
    Phase 2: Copy CI to CD initialize
    ERROR: This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
    Phase 3: Forward-Backward
    Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
    0% ERROR: FATAL: "main.c", line 1839: initialization failed
    
    ERROR: This step had 1 ERROR messages and 0 WARNING messages. Please check the log file for details.
    ERROR: Failed to start bw
    ERROR: Only 0 parts of 1 of Baum Welch were successfully completed
    ERROR: Parts 1 failed to run!
    Training failed in iteration 1
    Sphinxtrain path: /usr/local/lib/sphinxtrain
    Sphinxtrain binaries path: /usr/local/libexec/sphinxtrain
    Running the training
    
    [3].
    ./xxx.1.6-1.bw.log:217:ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
    ./xxx.1.6-1.bw.log:218:ERROR: "baum_welch.c", line 324: speaker_1/053 ignored
    ./xxx.1.6-1.bw.log:487:ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
    ./xxx.1.6-1.bw.log:488:ERROR: "baum_welch.c", line 324: speaker_1/192 ignored
    ./sinhala_buddhism.1.7-1.bw.log:217:ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
    ./xxx.1.7-1.bw.log:218:ERROR: "baum_welch.c", line 324: speaker_1/053 ignored
    ./xxx.1.7-1.bw.log:487:ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
    ./xxx.1.7-1.bw.log:488:ERROR: "baum_welch.c", line 324: speaker_1/192 ignored
    ./xxx.1.7-1.bw.log:519:ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
    ./xxx.1.7-1.bw.log:520:ERROR: "baum_welch.c", line 324: speaker_1/208 ignored
    
    [4].
    ./xxx.1.falign.log:55:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 003
    ./xxx.1.falign.log:159:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 020
    ./xxx.1.falign.log:315:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 035
    ./xxx.1.falign.log:395:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 048
    ./xxx.1.falign.log:415:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 051
    ./xxx.1.falign.log:429:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 053
    ./xxx.1.falign.log:449:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 056
    ./xxx.1.falign.log:653:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 081
    ./xxx.1.falign.log:673:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 084
    ./xxx.1.falign.log:699:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 088
    ./xxx.1.falign.log:725:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 093
    ./xxx.1.falign.log:861:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 105
    ./xxx.1.falign.log:899:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 111
    ./xxx.1.falign.log:913:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 113
    ./xxx.1.falign.log:945:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 118
    ./xxx.1.falign.log:953:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 119
    ./xxx.1.falign.log:1009:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 130
    ./xxx.1.falign.log:1159:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 144
    ./xxx.1.falign.log:1185:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 148
    ./xxx.1.falign.log:1253:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 159
    ./xxx.1.falign.log:1261:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 160
    ./xxx.1.falign.log:1367:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 167
    ./xxx.1.falign.log:1375:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 168
    ./xxx.1.falign.log:1383:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 169
    ./xxx.1.falign.log:1391:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 170
    ./xxx.1.falign.log:1525:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 192
    ./xxx.1.falign.log:1687:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 209
    ./xxx.1.falign.log:1877:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 230
    ./xxx.1.falign.log:1891:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 232
    ./xxx.1.falign.log:1905:ERROR: "main_align.c", line 762: Final state not reached; no alignment for 234
    
    Last edit: Izza 2016-06-12
    
    logdir_force_alignment_1.zip
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2016-06-12
      
      Phone waa is too rare in your training set, probably you need to remove it from your phoneset. It seems you are trying to build a syllable-based acoustic model, it is not advised to do that. CMUSphinx is designed to work with phoneme acoustic model, if you want to play with syllables you need another toolkit.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Izza - 2016-06-13

Hi Nickolay,

Thank you for the reply.

I was actually going through the sequence mentioned in [1] to build a speech-to-text converter for a native language; create a mapping between the unicode words and the transliteration, create the language model and create the acoustic model. I'm not sure what is meant by a syllable-based acoustic model; can you please explain what is a syllable-based acoustic model is, what I'm doing wrong here and how to correct it. I can share the dictionary file, language model, and any other information required.

Thank you.

[1]. http://stackoverflow.com/questions/31050003/build-new-acoustic-model-dictionary-language-model-for-uncommon-language-spee

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-06-13
  
  Phoneset should contain phones, not syllables. You can check example here:
  
  https://codeliteral.wordpress.com/2016/01/02/sinhala-speech-recognition-with-cmusphinx/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Izza - 2016-06-13

Thank you.

So, extracting out the phones from the training data set is a manual procedure?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Acoustic Model with Forced Alignment Failing

Speech Recognition Toolkit

Forums

Help

Acoustic Model with Forced Alignment Failing document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Acoustic Model with Forced Alignment Failing