Audo files are 16 KHz mono and there is a silence less than 0.2 seconds at the begining and at the end. I have checked that all word In the sentences are in dict file.
This is the output of bw with only 5 sentences (the problem is the same when using 130 but the files I send are for 5 for simplicity):
./bw -hmmdir voxforge_es_sphinx.cd_ptm_4000 -moddeffn voxforge_es_sphinx.cd_ptm_4000/mdef -ts2cbfn .ptm. -feat 1s_c_d_dd -svspec 0-12/13-25/26-38 -cmn current -agc none -dictfn es.dict -ctlfn frases.fileids -lsnfn frases.transcription -accumdir .
INFO: main.c(229): Compiled on Jan 18 2017 at 23:54:13
Current configuration: [NAME][DEFLT][VALUE]
-2passvar no no
-abeam 1e-100 1.000000e-100
-accumdir .
-agc none none
-agcthresh 2.0 2.000000e+00
-bbeam 1e-100 1.000000e-100
-cb2mllrfn .1cls. .1cls.
-cepdir
-cepext mfc mfc
-ceplen 13 13
-ckptintv 0
-cmn live current
-cmninit 40,3,-1 40,3,-1
-ctlfn frases.fileids
-diagfull no no
-dictfn es.dict
-example no no
-fdictfn
-feat 1s_c_d_dd 1s_c_d_dd
-fullvar no no
-help no no
-hmmdir voxforge_es_sphinx.cd_ptm_4000
-latdir
-latext
-lda
-ldadim 0 0
-lsnfn frases.transcription
-lw 11.5 1.150000e+01
-maxuttlen 0 0
-meanfn
-meanreest yes yes
-mixwfn
-mixwreest yes yes
-mllrmat
-mmie no no
-mmie_type rand rand
-moddeffn voxforge_es_sphinx.cd_ptm_4000/mdef
-mwfloor 0.00001 1.000000e-05
-npart 0
-nskip 0
-outphsegdir
-outputfullpath no no
-part 0
-pdumpdir
-phsegdir
-phsegext phseg phseg
-runlen -1 -1
-sentdir
-sentext sent sent
-spthresh 0.0 0.000000e+00
-svspec 0-12/13-25/26-38
-timing yes yes
-tmatfn
-tmatreest yes yes
-topn 4 4
-tpfloor 0.0001 1.000000e-04
-ts2cbfn .ptm.
-varfloor 0.00001 1.000000e-05
-varfn
-varnorm no no
-varreest yes yes
-viterbi no no
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: main.c(255): Using subvector specification 0-12/13-25/26-38
INFO: main.c(318): Reading voxforge_es_sphinx.cd_ptm_4000/mdef
INFO: model_def_io.c(573): Model definition info:
INFO: model_def_io.c(574): 26506 total models defined (26 base, 26480 tri)
INFO: model_def_io.c(575): 106024 total states
INFO: model_def_io.c(576): 4078 total tied states
INFO: model_def_io.c(577): 78 total tied CI states
INFO: model_def_io.c(578): 26 total tied transition matrices
INFO: model_def_io.c(579): 4 max state/model
INFO: model_def_io.c(580): 4 min state/model
INFO: s3mixw_io.c(117): Read voxforge_es_sphinx.cd_ptm_4000/mixture_weights [4078x3x128 array]
INFO: s3tmat_io.c(118): Read voxforge_es_sphinx.cd_ptm_4000/transition_matrices [26x3x4 array]
INFO: mod_inv.c(301): inserting tprob floor 1.000000e-04 and renormalizing
INFO: s3gau_io.c(169): Read voxforge_es_sphinx.cd_ptm_4000/means [26x3x128 array]
INFO: s3gau_io.c(169): Read voxforge_es_sphinx.cd_ptm_4000/variances [26x3x128 array]
INFO: gauden.c(176): 26 total mgau
INFO: gauden.c(150): 3 feature streams (|0|=13 |1|=13 |2|=13 )
INFO: gauden.c(187): 128 total densities
INFO: gauden.c(90): min_var=1.000000e-05
INFO: gauden.c(165): compute 4 densities/frame
INFO: main.c(431): Will reestimate mixing weights.
INFO: main.c(433): Will reestimate means.
INFO: main.c(435): Will reestimate variances.
INFO: main.c(443): Will reestimate transition matrices
INFO: main.c(456): Reading main dictionary: es.dict
INFO: lexicon.c(221): 23498 entries added from es.dict
INFO: main.c(466): Reading filler dictionary: voxforge_es_sphinx.cd_ptm_4000/noisedict
INFO: lexicon.c(221): 3 entries added from voxforge_es_sphinx.cd_ptm_4000/noisedict
INFO: corpus.c(1062): Will process all remaining utts starting at 0
INFO: main.c(665): Reestimation: Baum-Welch
INFO: main.c(669): Generating profiling information consumes significant CPU resources.
INFO: main.c(670): If you are not interested in profiling, use -timing no
column defns
<seq>
<id>
<n_frame_in>
<n_frame_del>
<n_state_shmm>
<avg_states_alpha>
<avg_states_beta>
<avg_states_reest>
<avg_posterior_prune>
<frame_log_lik>
<utt_log_lik>
... timing info ...
INFO: cmn.c(133): CMN: 90.23 -5.76 -2.74 -2.49 -1.62 0.68 3.97 2.71 0.05 2.42 1.37 1.07 0.77
ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
ERROR: "baum_welch.c", line 324: frase_0003 ignored
utt> 0 frase_0003 94 0 96 34 utt 0.009x 1.005e upd 0.009x 0.968e fwd 0.009x 0.952e bwd 0.000x 0.000e gau 0.009x 0.808e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e</utt_log_lik></frame_log_lik></avg_posterior_prune></avg_states_reest></avg_states_beta></avg_states_alpha></n_state_shmm></n_frame_del></n_frame_in></id></seq>
INFO: cmn.c(133): CMN: 90.32 -3.99 -3.79 -3.77 -4.96 -2.44 1.35 3.25 3.86 4.16 1.11 -0.09 0.61
ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
ERROR: "baum_welch.c", line 324: frase_0005 ignored
utt> 1 frase_0005 124 0 140 33 utt 0.010x 1.101e upd 0.010x 1.070e fwd 0.010x 1.058e bwd 0.000x 0.000e gau 0.010x 0.922e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e
INFO: cmn.c(133): CMN: 92.31 -6.31 -3.46 -4.67 -3.46 0.70 6.33 3.46 0.69 0.51 0.58 1.58 1.28
ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
ERROR: "baum_welch.c", line 324: frase_0010 ignored
utt> 2 frase_0010 144 0 164 44 utt 0.011x 1.150e upd 0.011x 1.125e fwd 0.011x 1.114e bwd 0.000x 0.000e gau 0.008x 1.266e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e
INFO: cmn.c(133): CMN: 92.89 -7.46 -2.51 -3.84 -3.80 -2.24 3.13 3.43 2.24 1.53 0.75 1.23 1.23
ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
ERROR: "baum_welch.c", line 324: frase_0011 ignored
utt> 3 frase_0011 106 0 132 38 utt 0.011x 0.901e upd 0.011x 0.863e fwd 0.011x 0.852e bwd 0.000x 0.000e gau 0.011x 0.755e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e
INFO: cmn.c(133): CMN: 88.79 -4.30 -2.94 -2.60 -4.40 -2.73 3.61 3.67 0.46 0.15 -0.74 0.09 1.01
ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
ERROR: "baum_welch.c", line 324: frase_0015 ignored
utt> 4 frase_0015 116 0 116 30 utt 0.007x 1.184e upd 0.007x 1.133e fwd 0.007x 1.114e bwd 0.000x 0.000e gau 0.003x 1.955e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e
overall> stats 0 (-0) 0.000000e+00 0.000000e+00 0.000x 1.070e
WARN: "accum.c", line 628: Over 500 senones never occur in the input data. This is normal for context-dependent untied senone training or for adaptation, but could indicate a serious problem otherwise.
INFO: s3mixw_io.c(233): Wrote ./mixw_counts [4078x3x128 array]
INFO: s3tmat_io.c(176): Wrote ./tmat_counts [26x3x4 array]
INFO: s3gau_io.c(485): Wrote ./gauden_counts with means with vars [26x3x128 vector arrays]
INFO: main.c(999): Counts saved to .
Any help please
thanks a lot in advance
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm trying to adapt spanish acoustic model (voxforge_es_sphinx.cd_ptm_4000) and I get an error when I collect statistics with bw.
The files in workspace dir are:
https://dl.dropboxusercontent.com/u/11313561/tmp/adaptacion.7z
Audo files are 16 KHz mono and there is a silence less than 0.2 seconds at the begining and at the end. I have checked that all word In the sentences are in dict file.
This is the output of bw with only 5 sentences (the problem is the same when using 130 but the files I send are for 5 for simplicity):
./bw -hmmdir voxforge_es_sphinx.cd_ptm_4000 -moddeffn voxforge_es_sphinx.cd_ptm_4000/mdef -ts2cbfn .ptm. -feat 1s_c_d_dd -svspec 0-12/13-25/26-38 -cmn current -agc none -dictfn es.dict -ctlfn frases.fileids -lsnfn frases.transcription -accumdir .
INFO: main.c(229): Compiled on Jan 18 2017 at 23:54:13
Current configuration:
[NAME] [DEFLT] [VALUE]
-2passvar no no
-abeam 1e-100 1.000000e-100
-accumdir .
-agc none none
-agcthresh 2.0 2.000000e+00
-bbeam 1e-100 1.000000e-100
-cb2mllrfn .1cls. .1cls.
-cepdir
-cepext mfc mfc
-ceplen 13 13
-ckptintv 0
-cmn live current
-cmninit 40,3,-1 40,3,-1
-ctlfn frases.fileids
-diagfull no no
-dictfn es.dict
-example no no
-fdictfn
-feat 1s_c_d_dd 1s_c_d_dd
-fullvar no no
-help no no
-hmmdir voxforge_es_sphinx.cd_ptm_4000
-latdir
-latext
-lda
-ldadim 0 0
-lsnfn frases.transcription
-lw 11.5 1.150000e+01
-maxuttlen 0 0
-meanfn
-meanreest yes yes
-mixwfn
-mixwreest yes yes
-mllrmat
-mmie no no
-mmie_type rand rand
-moddeffn voxforge_es_sphinx.cd_ptm_4000/mdef
-mwfloor 0.00001 1.000000e-05
-npart 0
-nskip 0
-outphsegdir
-outputfullpath no no
-part 0
-pdumpdir
-phsegdir
-phsegext phseg phseg
-runlen -1 -1
-sentdir
-sentext sent sent
-spthresh 0.0 0.000000e+00
-svspec 0-12/13-25/26-38
-timing yes yes
-tmatfn
-tmatreest yes yes
-topn 4 4
-tpfloor 0.0001 1.000000e-04
-ts2cbfn .ptm.
-varfloor 0.00001 1.000000e-05
-varfn
-varnorm no no
-varreest yes yes
-viterbi no no
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: main.c(255): Using subvector specification 0-12/13-25/26-38
INFO: main.c(318): Reading voxforge_es_sphinx.cd_ptm_4000/mdef
INFO: model_def_io.c(573): Model definition info:
INFO: model_def_io.c(574): 26506 total models defined (26 base, 26480 tri)
INFO: model_def_io.c(575): 106024 total states
INFO: model_def_io.c(576): 4078 total tied states
INFO: model_def_io.c(577): 78 total tied CI states
INFO: model_def_io.c(578): 26 total tied transition matrices
INFO: model_def_io.c(579): 4 max state/model
INFO: model_def_io.c(580): 4 min state/model
INFO: s3mixw_io.c(117): Read voxforge_es_sphinx.cd_ptm_4000/mixture_weights [4078x3x128 array]
INFO: s3tmat_io.c(118): Read voxforge_es_sphinx.cd_ptm_4000/transition_matrices [26x3x4 array]
INFO: mod_inv.c(301): inserting tprob floor 1.000000e-04 and renormalizing
INFO: s3gau_io.c(169): Read voxforge_es_sphinx.cd_ptm_4000/means [26x3x128 array]
INFO: s3gau_io.c(169): Read voxforge_es_sphinx.cd_ptm_4000/variances [26x3x128 array]
INFO: gauden.c(176): 26 total mgau
INFO: gauden.c(150): 3 feature streams (|0|=13 |1|=13 |2|=13 )
INFO: gauden.c(187): 128 total densities
INFO: gauden.c(90): min_var=1.000000e-05
INFO: gauden.c(165): compute 4 densities/frame
INFO: main.c(431): Will reestimate mixing weights.
INFO: main.c(433): Will reestimate means.
INFO: main.c(435): Will reestimate variances.
INFO: main.c(443): Will reestimate transition matrices
INFO: main.c(456): Reading main dictionary: es.dict
INFO: lexicon.c(221): 23498 entries added from es.dict
INFO: main.c(466): Reading filler dictionary: voxforge_es_sphinx.cd_ptm_4000/noisedict
INFO: lexicon.c(221): 3 entries added from voxforge_es_sphinx.cd_ptm_4000/noisedict
INFO: corpus.c(1062): Will process all remaining utts starting at 0
INFO: main.c(665): Reestimation: Baum-Welch
INFO: main.c(669): Generating profiling information consumes significant CPU resources.
INFO: main.c(670): If you are not interested in profiling, use -timing no
column defns
<seq>
<id>
<n_frame_in>
<n_frame_del>
<n_state_shmm>
<avg_states_alpha>
<avg_states_beta>
<avg_states_reest>
<avg_posterior_prune>
<frame_log_lik>
<utt_log_lik>
... timing info ...
INFO: cmn.c(133): CMN: 90.23 -5.76 -2.74 -2.49 -1.62 0.68 3.97 2.71 0.05 2.42 1.37 1.07 0.77
ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
ERROR: "baum_welch.c", line 324: frase_0003 ignored
utt> 0 frase_0003 94 0 96 34 utt 0.009x 1.005e upd 0.009x 0.968e fwd 0.009x 0.952e bwd 0.000x 0.000e gau 0.009x 0.808e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e</utt_log_lik></frame_log_lik></avg_posterior_prune></avg_states_reest></avg_states_beta></avg_states_alpha></n_state_shmm></n_frame_del></n_frame_in></id></seq>
INFO: cmn.c(133): CMN: 90.32 -3.99 -3.79 -3.77 -4.96 -2.44 1.35 3.25 3.86 4.16 1.11 -0.09 0.61
ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
ERROR: "baum_welch.c", line 324: frase_0005 ignored
utt> 1 frase_0005 124 0 140 33 utt 0.010x 1.101e upd 0.010x 1.070e fwd 0.010x 1.058e bwd 0.000x 0.000e gau 0.010x 0.922e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e
INFO: cmn.c(133): CMN: 92.31 -6.31 -3.46 -4.67 -3.46 0.70 6.33 3.46 0.69 0.51 0.58 1.58 1.28
ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
ERROR: "baum_welch.c", line 324: frase_0010 ignored
utt> 2 frase_0010 144 0 164 44 utt 0.011x 1.150e upd 0.011x 1.125e fwd 0.011x 1.114e bwd 0.000x 0.000e gau 0.008x 1.266e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e
INFO: cmn.c(133): CMN: 92.89 -7.46 -2.51 -3.84 -3.80 -2.24 3.13 3.43 2.24 1.53 0.75 1.23 1.23
ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
ERROR: "baum_welch.c", line 324: frase_0011 ignored
utt> 3 frase_0011 106 0 132 38 utt 0.011x 0.901e upd 0.011x 0.863e fwd 0.011x 0.852e bwd 0.000x 0.000e gau 0.011x 0.755e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e
INFO: cmn.c(133): CMN: 88.79 -4.30 -2.94 -2.60 -4.40 -2.73 3.61 3.67 0.46 0.15 -0.74 0.09 1.01
ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached
ERROR: "baum_welch.c", line 324: frase_0015 ignored
utt> 4 frase_0015 116 0 116 30 utt 0.007x 1.184e upd 0.007x 1.133e fwd 0.007x 1.114e bwd 0.000x 0.000e gau 0.003x 1.955e rsts 0.000x 0.000e rstf 0.000x 0.000e rstu 0.000x 0.000e
overall> stats 0 (-0) 0.000000e+00 0.000000e+00 0.000x 1.070e
WARN: "accum.c", line 628: Over 500 senones never occur in the input data. This is normal for context-dependent untied senone training or for adaptation, but could indicate a serious problem otherwise.
INFO: s3mixw_io.c(233): Wrote ./mixw_counts [4078x3x128 array]
INFO: s3tmat_io.c(176): Wrote ./tmat_counts [26x3x4 array]
INFO: s3gau_io.c(485): Wrote ./gauden_counts with means with vars [26x3x128 vector arrays]
INFO: main.c(999): Counts saved to .
Any help please
thanks a lot in advance
Your input data has 8bit sample width:
frase_0003.wav: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 8 bit, mono 16000 Hz
it should be 16bit.
That was the point. Much better with adaptation. Now, I will try to add new words.
Thanks a lot!!