CMU Sphinx / Forums / Help: SphinxTrain Error for RM

Mike - 2008-05-27

I am traing RM acoustic model using SphinxTrain nightly build and met lots of errors (all errors are "final state not reached" according to the log files). After that, I tried to decode RM using my trained model and got near 100% WER (using Sphinx3-0.7 and Sphinxbase-0.3). I guess the model is bad.

I was following Keith Vertanen's WSJ training recipe (http://www.inference.phy.cam.ac.uk/kv227/htk/) for the whole procedure, the purpose of traing RM is to get a model to do force alignment for WSJ traing. I got stuck at the RM step.

Anybody can give me some advice? your help is highly appreciated...

The RM traing log is as follows:

MODULE: 00 verify training files (2008-03-16 19:04)
O.S. is case sensitive ("A" != "a").

Phones will be treated as case sensitive.

Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file.

Found 1141 words using 45 phones

passed
Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary

passed
Phase 3: CTL - Check general format; utterance length (must be positive); files exist

passed
Phase 4: CTL - Checking number of lines in the transcript should match lines in control file

passed
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.

Total Hours Training: 1.52594188034191

This is a small amount of data, no comment at this time

WARNING
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary

Words in dictionary: 1138

Words in filler dictionary: 3

passed
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once

passed

MODULE: 01 Vector Quantization (2008-03-16 19:04)
Skipped for continuous models

MODULE: 02 Training Context Independent models for forced alignment (2008-03-16 19:04)
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg

MODULE: 03 Force-aligning transcripts (2008-03-16 19:04)
Skipped: $ST::CFG_FORCEDALIGN set to 'no' in sphinx_train.cfg

MODULE: 05 Train LDA transformation (2008-03-16 19:04)
Skipped (set $CFG_LDA_MLLT = 'yes' to enable)

MODULE: 06 Train MLLT transformation (2008-03-16 19:04)
Skipped (set $CFG_LDA_MLLT = 'yes' to enable)

MODULE: 20 Training Context Independent models (2008-03-16 19:04)
Phase 1: Cleaning up directories:

accumulator... logs... qmanager... models... completed
Phase 2: Flat initialize
maketopology.pl Log File
completed
mk_mdef_gen Log File
completed
mk_flat Log File
completed
init_gau Log File
completed
norm Log File
completed
init_gau Log File
completed
norm Log File
completed
cp_parm Log File
completed
cp_parm Log File
completed

Phase 3: Forward-Backward

Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
bw Log File
completed
Normalization for iteration: 1
norm Log File
completed
Current Overall Likelihood Per Frame = -12.8224960671211
Baum welch starting for 1 Gaussian(s), iteration: 2 (1 of 1)
bw Log File
completed
Normalization for iteration: 2
norm Log File
completed
Current Overall Likelihood Per Frame = -11.0960623871118
Convergence Ratio = 0.134640998988967
Baum welch starting for 1 Gaussian(s), iteration: 3 (1 of 1)
bw Log File
completed
Normalization for iteration: 3
norm Log File
completed
Current Overall Likelihood Per Frame = -7.49851970809299
Convergence Ratio = 0.324217957101376
Baum welch starting for 1 Gaussian(s), iteration: 4 (1 of 1)
bw Log File
This step had 2 ERROR messages and 0 WARNING messages. Please check the log file for details.
completed
Normalization for iteration: 4
norm Log File
completed
Current Overall Likelihood Per Frame = -5.69164132824925
Convergence Ratio = 0.240964677054007
Baum welch starting for 1 Gaussian(s), iteration: 5 (1 of 1)
bw Log File
This step had 4 ERROR messages and 0 WARNING messages. Please check the log file for details.
completed
Normalization for iteration: 5
norm Log File
completed
Current Overall Likelihood Per Frame = -5.39003655390562
Convergence Ratio = 0.0529908258355429
Baum welch starting for 1 Gaussian(s), iteration: 6 (1 of 1)
bw Log File
This step had 4 ERROR messages and 0 WARNING messages. Please check the log file for details.
completed
Normalization for iteration: 6
norm Log File
completed
Current Overall Likelihood Per Frame = -5.31337727094557
Training completed after 6 iterations

MODULE: 30 Training Context Dependent models (2008-03-16 19:05)
Phase 1: Cleaning up directories:

accumulator... logs... qmanager... completed
Phase 2: Initialization
mk_mdef_gen Log File
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check the log file for details.
completed
init_mixw Log File
completed
Phase 3: Forward-Backward
Baum welch starting for iteration: 1 (1 of 1)
bw Log File
This step had 6 ERROR messages and 1 WARNING messages. Please check the log file for details.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Mike - 2008-05-27
  
  Detailed traing log:
  
  INFO: main.c(196): Compiled on May 21 2008 at 18:51:41
  
  /home/tma3/Sphinx/sphinx_recipe/rm1/bin/bw \
  -moddeffn /home/tma3/Sphinx/sphinx_recipe/rm1/model_architecture/rm1.ci.mdef \
  -ts2cbfn .cont. \
  -mixwfn /home/tma3/Sphinx/sphinx_recipe/rm1/model_parameters/rm1.ci_cont/mixture_weights \
  -mwfloor 1e-08 \
  -tmatfn /home/tma3/Sphinx/sphinx_recipe/rm1/model_parameters/rm1.ci_cont/transition_matrices \
  -meanfn /home/tma3/Sphinx/sphinx_recipe/rm1/model_parameters/rm1.ci_cont/means \
  -varfn /home/tma3/Sphinx/sphinx_recipe/rm1/model_parameters/rm1.ci_cont/variances \
  -ltsoov no \
  -dictfn /home/tma3/Sphinx/sphinx_recipe/rm1/etc/rm1.dic \
  -fdictfn /home/tma3/Sphinx/sphinx_recipe/rm1/etc/rm1.filler \
  -ctlfn /home/tma3/Sphinx/sphinx_recipe/rm1/etc/rm1_train.fileids \
  -part 1 \
  -npart 1 \
  -cepdir /home/tma3/Sphinx/sphinx_recipe/rm1/feat \
  -cepext mfc \
  -lsnfn /home/tma3/Sphinx/sphinx_recipe/rm1/etc/rm1_train.transcription \
  -accumdir /home/tma3/Sphinx/sphinx_recipe/rm1/bwaccumdir/rm1_buff_1 \
  -varfloor 0.0001 \
  -topn 1 \
  -abeam 1e-90 \
  -bbeam 1e-10 \
  -agc none \
  -cmn current \
  -varnorm no \
  -meanreest yes \
  -varreest yes -2passvar yes \
  -tmatreest yes \
  -fullvar no \
  -diagfull no \
  -feat 1s_c_d_dd \
  -ceplen 13 \
  -timing no
  
  [Switch] [Default] [Value]
  -help no no
  -example no no
  -hmmdir
  -moddeffn /home/tma3/Sphinx/sphinx_recipe/rm1/model_architecture/rm1.ci.mdef
  -tmatfn /home/tma3/Sphinx/sphinx_recipe/rm1/model_parameters/rm1.ci_cont/transition_matrices
  -mixwfn /home/tma3/Sphinx/sphinx_recipe/rm1/model_parameters/rm1.ci_cont/mixture_weights
  -meanfn /home/tma3/Sphinx/sphinx_recipe/rm1/model_parameters/rm1.ci_cont/means
  -varfn /home/tma3/Sphinx/sphinx_recipe/rm1/model_parameters/rm1.ci_cont/variances
  -fullvar no no
  -diagfull no no
  -mwfloor 0.00001 1.000000e-08
  -tpfloor 0.0001 1.000000e-04
  -varfloor 0.00001 1.000000e-04
  -topn 4 1
  -dictfn /home/tma3/Sphinx/sphinx_recipe/rm1/etc/rm1.dic
  -fdictfn /home/tma3/Sphinx/sphinx_recipe/rm1/etc/rm1.filler
  -ltsoov no no
  -ctlfn /home/tma3/Sphinx/sphinx_recipe/rm1/etc/rm1_train.fileids
  -nskip
  -runlen -1 -1
  -part 1
  -npart 1
  -cepext mfc mfc
  -cepdir /home/tma3/Sphinx/sphinx_recipe/rm1/feat
  -phsegext phseg phseg
  -phsegdir
  -outphsegdir
  -sentdir
  -sentext sent sent
  -lsnfn /home/tma3/Sphinx/sphinx_recipe/rm1/etc/rm1_train.transcription
  -accumdir /home/tma3/Sphinx/sphinx_recipe/rm1/bwaccumdir/rm1_buff_1
  -ceplen 13 13
  -cepwin 0 0
  -agc max none
  -cmn current current
  -varnorm no no
  -silcomp none none
  -sildel no no
  -siltag SIL SIL
  -abeam 1e-100 1.000000e-90
  -bbeam 1e-100 1.000000e-10
  -varreest yes yes
  -meanreest yes yes
  -mixwreest yes yes
  -tmatreest yes yes
  -mllrmat
  -cb2mllrfn .1cls. .1cls.
  -ts2cbfn .cont.
  -feat 1s_c_d_dd 1s_c_d_dd
  -ldafn
  -ldadim 29 29
  -ldaaccum no no
  -timing yes no
  -viterbi no no
  -2passvar no yes
  -sildelfn
  -spthresh 0.0 0.000000e+00
  -maxuttlen 0 0
  -ckptintv
  -outputfullpath no no
  -fullsuffixmatch no no
  -pdumpdir
  INFO: main.c(253): Reading /home/tma3/Sphinx/sphinx_recipe/rm1/model_architecture/rm1.ci.mdef
  INFO: model_def_io.c(587): Model definition info:
  INFO: model_def_io.c(588): 45 total models defined (45 base, 0 tri)
  INFO: model_def_io.c(589): 180 total states
  INFO: model_def_io.c(590): 135 total tied states
  INFO: model_def_io.c(591): 135 total tied CI states
  INFO: model_def_io.c(592): 45 total tied transition matrices
  INFO: model_def_io.c(593): 4 max state/model
  INFO: model_def_io.c(594): 4 min state/model
  INFO: s3mixw_io.c(116): Read /home/tma3/Sphinx/sphinx_recipe/rm1/model_parameters/rm1.ci_cont/mixture_weights [135x1x1 array]
  INFO: s3tmat_io.c(115): Read /home/tma3/Sphinx/sphinx_recipe/rm1/model_parameters/rm1.ci_cont/transition_matrices [45x3x4 array]
  INFO: mod_inv.c(297): inserting tprob floor 1.000000e-04 and renormalizing
  INFO: s3gau_io.c(166): Read /home/tma3/Sphinx/sphinx_recipe/rm1/model_parameters/rm1.ci_cont/means [135x1x1 array]
  INFO: s3gau_io.c(166): Read /home/tma3/Sphinx/sphinx_recipe/rm1/model_parameters/rm1.ci_cont/variances [135x1x1 array]
  INFO: gauden.c(181): 135 total mgau
  INFO: gauden.c(155): 1 feature streams (|0|=39 )
  INFO: gauden.c(192): 1 total densities
  INFO: gauden.c(98): min_var=1.000000e-04
  INFO: gauden.c(170): compute 1 densities/frame
  INFO: main.c(361): Will reestimate mixing weights.
  INFO: main.c(363): Will reestimate means.
  INFO: main.c(365): Will reestimate variances.
  INFO: main.c(367): WIll NOT optionally delete silence in Baum Welch or Viterbi.
  INFO: main.c(375): Will reestimate transition matrices
  INFO: main.c(388): Reading main lexicon: /home/tma3/Sphinx/sphinx_recipe/rm1/etc/rm1.dic
  INFO: lexicon.c(233): 1138 entries added from /home/tma3/Sphinx/sphinx_recipe/rm1/etc/rm1.dic
  INFO: main.c(400): Reading filler lexicon: /home/tma3/Sphinx/sphinx_recipe/rm1/etc/rm1.filler
  INFO: lexicon.c(233): 3 entries added from /home/tma3/Sphinx/sphinx_recipe/rm1/etc/rm1.filler
  INFO: main.c(421): Silence Tag SIL
  INFO: corpus.c(1343): Will process all remaining utts starting at 0
  INFO: main.c(620): Reestimation: Baum-Welch
  column defns
  <seq>
  <id>
  <n_frame_in>
  <n_frame_del>
  <n_state_shmm>
  <avg_states_alpha>
  <avg_states_beta>
  <avg_states_reest>
  <avg_posterior_prune>
  <frame_log_lik>
  <utt_log_lik>
  ... timing info ...
  utt> 0 st0942 649 0INFO: cvt2triphone.c(199): no multiphones defined, no conversion done
  200 23 2 4 3.339835e-12 -7.770043e+00 -5.042758e+03
  utt> 1 st1940 144 0 60 17 2 4 5.266271e-12 -5.199626e+00 -7.487461e+02
  utt> 2 sr391 415 0 184 21 2 5 5.456210e-12 -1.829904e+00 -7.594101e+02
  
  --------- all ERRORs like this -------
  utt> 1589 st1455 233 0 128 25 ERROR: "backward.c", line 401: final state not reached
  ERROR: "baum_welch.c", line 331: mcc0_5/st1455 ignored
  
  INFO: s3mixw_io.c(232): Wrote /home/tma3/Sphinx/sphinx_recipe/rm1/bwaccumdir/rm1_buff_1/mixw_counts [135x1x1 array]
  INFO: s3tmat_io.c(174): Wrote /home/tma3/Sphinx/sphinx_recipe/rm1/bwaccumdir/rm1_buff_1/tmat_counts [45x3x4 array]
  INFO: s3gau_io.c(478): Wrote /home/tma3/Sphinx/sphinx_recipe/rm1/bwaccumdir/rm1_buff_1/gauden_counts with means with vars (2pass) [135x1x1 vector arrays]
  INFO: main.c(1033): Counts saved to /home/tma3/Sphinx/sphinx_recipe/rm1/bwaccumdir/rm1_buff_1
  Mon May 26 15:54:22 2008
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Mike - 2008-05-27
  
  Decoding output using trained Model:
  
  A TO (sr569.mfc)
  A A AND TO TO (sr089.mfc)
  A A A A NEW A A TO (st0902.mfc)
  A A A A THREE A A TO (st1253.mfc)
  A AN TO (sr449.mfc)
  A A A A AS TO (st0019.mfc)
  A A A A TO (st1181.mfc)
  MANY A NEW A THIRD A A TO (st0995.mfc)
  A A A TO (st1089.mfc)
  A A AN A TO (st0270.mfc)
  A A TO (st1996.mfc)
  A A A A A A A A TO (st0626.mfc)
  A A A NEW A A A AND (sr249.mfc)
  A A TO TO (sr369.mfc)
  A A A A A AN A A TO (sr409.mfc)
  A A A NEW A A A TO (sr329.mfc)
  A A A TO (sr289.mfc)
  AN TO (st1350.mfc)
  A A A A A A A A A A (st1904.mfc)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Mike - 2008-05-27
  
  Sorry, the Keith Vertanen's WSJ training recipe link should be:
  http://www.inference.phy.cam.ac.uk/kv227/sphinx/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Mike - 2008-05-28
  
  I figured out the issue, it's feature parameter problem.
  
  Now I still got some errors of "final state not reached", but the decoding works just fine. I am working on the WSJ training now, thanks anyway!
  
  -Tao
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SphinxTrain Error for RM

Speech Recognition Toolkit

Forums

Help

SphinxTrain Error for RM document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

The RM traing log is as follows:

passed

Phase 3: Forward-Backward

Detailed traing log:

Decoding output using trained Model:

SphinxTrain Error for RM