CMU Sphinx / Forums / Help: Adapting french acoustic model errors

Boris Mansencal - 2011-11-10

I am following the tutorial here http://cmusphinx.sourceforge.net/wiki/tutori
aladapt
to adapt the French acoustic model
(provided here http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and
%20Language%20Models/ )
for a particular speaker.

I am using latest sphinxbase/sphinxtrain/pocketsphinx from svn (rev 11259)
I have three files:
- phrases_s_u.txt : with the utterances in and with utt_ids
- phrases_files.txt : with the wav files named as utt_ids, except .wav extension. Wav files are 16bit 16kHz Mono.
- phrases.dic : a dictionary with only words in utterances

I run :
./bw -hmmdir lium_french_f0 -moddeffn lium_french_f0/mdef.txt -ts2cbfn .semi.
-feat 1s_c_d_dd -cmn current -agc max -dictfn phrases.dic -ctlfn
phrases_files.txt -lsnfn phrases_s_u.txt -accumdir .

I get the following output:

INFO: main.c(194): Compiled on Nov 9 2011 at 14:42:35
INFO: cmd_ln.c(691): Parsing command line:
./bw \
-hmmdir lium_french_f0 \
-moddeffn lium_french_f0/mdef.txt \
-ts2cbfn .semi. \
-feat 1s_c_d_dd \
-cmn current \
-agc max \
-dictfn phrases.dic \
-ctlfn phrases_files.txt \
-lsnfn phrases_s_u.txt \
-accumdir .

Current configuration:

-2passvar no no
-abeam 1e-100 1.000000e-100
-accumdir .
-agc none max
-agcthresh 2.0 2.000000e+00
-bbeam 1e-100 1.000000e-100
-cb2mllrfn .1cls. .1cls.
-cepdir
-cepext mfc mfc
-ceplen 13 13
-ckptintv 0
-cmn current current
-cmninit 8.0 8.0
-ctlfn phrases_files.txt
-diagfull no no
-dictfn phrases.dic
-example no no
-fdictfn
-feat 1s_c_d_dd 1s_c_d_dd
-fullsuffixmatch no no
-fullvar no no
-help no no
-hmmdir lium_french_f0
-latdir
-latext
-lda
-ldaaccum no no
-ldadim 0 0
-lsnfn phrases_s_u.txt
-ltsoov no no
-lw 11.5 1.150000e+01
-maxuttlen 0 0
-meanfn
-meanreest yes yes
-mixwfn
-mixwreest yes yes
-mllrmat
-mmie no no
-mmie_type rand rand
-moddeffn lium_french_f0/mdef.txt
-mwfloor 0.00001 1.000000e-05
-npart 0
-nskip 0
-outphsegdir
-outputfullpath no no
-part 0
-pdumpdir
-phsegdir
-phsegext phseg phseg
-runlen -1 -1
-sentdir
-sentext sent sent
-spthresh 0.0 0.000000e+00
-svspec
-timing yes yes
-tmatfn
-tmatreest yes yes
-topn 4 4
-tpfloor 0.0001 1.000000e-04
-ts2cbfn .semi.
-varfloor 0.00001 1.000000e-05
-varfn
-varnorm no no
-varreest yes yes
-viterbi no no

INFO: feat.c(684): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13, CMN='current', VARNORM='no', AGC='max'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: agc.c(132): AGCEMax: max= 5.00
INFO: main.c(283): Reading lium_french_f0/mdef.txt
INFO: model_def_io.c(573): Model definition info:
INFO: model_def_io.c(574): 82134 total models defined (45 base, 82089 tri)
INFO: model_def_io.c(575): 492804 total states
INFO: model_def_io.c(576): 5725 total tied states
INFO: model_def_io.c(577): 225 total tied CI states
INFO: model_def_io.c(578): 45 total tied transition matrices
INFO: model_def_io.c(579): 6 max state/model
INFO: model_def_io.c(580): 6 min state/model
INFO: s3mixw_io.c(116): Read lium_french_f0/mixture_weights
INFO: s3tmat_io.c(115): Read lium_french_f0/transition_matrices
INFO: mod_inv.c(301): inserting tprob floor 1.000000e-04 and renormalizing
INFO: s3gau_io.c(166): Read lium_french_f0/means
INFO: s3gau_io.c(166): Read lium_french_f0/variances
INFO: gauden.c(183): 5725 total mgau
INFO: gauden.c(157): 1 feature streams (|0|=39 )
INFO: gauden.c(194): 22 total densities
INFO: gauden.c(97): min_var=1.000000e-05
INFO: gauden.c(172): compute 4 densities/frame
INFO: main.c(395): Will reestimate mixing weights.
INFO: main.c(397): Will reestimate means.
INFO: main.c(399): Will reestimate variances.
INFO: main.c(407): Will reestimate transition matrices
INFO: main.c(420): Reading main lexicon: phrases.dic
INFO: lexicon.c(218): 631 entries added from phrases.dic
INFO: main.c(432): Reading filler lexicon: lium_french_f0/noisedict
INFO: lexicon.c(218): 8 entries added from lium_french_f0/noisedict
INFO: corpus.c(1078): Will process all remaining utts starting at 0
INFO: main.c(639): Reestimation: Baum-Welch
INFO: main.c(644): Generating profiling information consumes significant CPU
resources.
INFO: main.c(645): If you are not interested in profiling, use -timing no
INFO: cmn.c(175): CMN: 5.28 -0.19 0.28 0.24 -0.04 -0.15 0.01 -0.11 -0.10 -0.05
-0.09 -0.09 -0.13
INFO: agc.c(123): AGCMax: obs=max= 9.20
WARNING: "gauden.c", line 1342: Scaling factor too small: -1535020.927524
ERROR: "backward.c", line 1019: alpha(5.548316e-02) <> sum of alphas * betas
(0.000000e+00) in frame 278
ERROR: "baum_welch.c", line 333: actor1_001 ignored

of codebooks in mean/var files, 5725, inconsistent with ts2cb mapping 1

column defns
<seq>
<id>
<n_frame_in>
<n_frame_del>
<n_state_shmm>
<avg_states_alpha>
<avg_states_beta>
<avg_states_reest>
<avg_posterior_prune>
<frame_log_lik>
<utt_log_lik>
... timing info ...
utt> 0 actor1_001 280 0 54 51 utt 0.001x 1.194e upd 0.001x 1.109e fwd 0.001x
1.047e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x
0.000e rstu 0.000x 0.000e
INFO: cmn.c(175): CMN: 5.12 -0.24 0.15 0.19 -0.03 -0.14 -0.10 -0.06 -0.13
-0.01 -0.07 -0.09 -0.13
INFO: agc.c(123): AGCMax: obs=max= 9.36
WARNING: "gauden.c", line 1342: Scaling factor too small: -1813250.056768
ERROR: "backward.c", line 1019: alpha(1.046842e-01) <> sum of alphas * betas
(0.000000e+00) in frame 332
ERROR: "baum_welch.c", line 333: actor1_002 ignored
utt> 1 actor1_002 334 0 90 83 utt 0.001x 1.993e upd 0.001x 1.285e fwd 0.001x
1.239e bwd 0.000x 0.000e gau 0.000x 0.000e rsts 0.000x 0.000e rstf 0.000x
0.000e rstu 0.000x 0.000e
INFO: cmn.c(175): CMN: 4.88 0.12 0.25 0.13 -0.02 -0.22 0.02 -0.13 -0.12 -0.01
-0.12 -0.11 -0.12
INFO: agc.c(123): AGCMax: obs=max= 8.96
WARNING: "gauden.c", line 1342: Scaling factor too small: -1441763.240420
ERROR: "backward.c", line 1019: alpha(6.752134e-02) <> sum of alphas * betas
(0.000000e+00) in frame 258
ERROR: "baum_welch.c", line 333: actor1_003 ignored </utt_log_lik></frame_log_lik></avg_posterior_prune></avg_states_reest></avg_states_beta></avg_states_alpha></n_state_shmm></n_frame_del></n_frame_in></id></seq>

utt> 32 actor1_033 1487 0 510 464 utt 0.007x 1.005e upd 0.007x 0.997e fwd
0.007x 1.007e bwd 0.000x 0.146e gau 0.000x 0.728e rsts 0.000x 0.000e rstf
0.000x 0.000e rstu 0.000x 0.000e
INFO: cmn.c(175): CMN: 9.18 -0.14 0.37 0.24 -0.31 -0.23 -0.04 -0.26 -0.17
-0.12 -0.14 -0.15 -0.20
INFO: agc.c(123): AGCMax: obs=max= 6.47
WARNING: "gauden.c", line 1342: Scaling factor too small: -2722683.598133
ERROR: "backward.c", line 1019: alpha(1.191249e-01) <> sum of alphas * betas
(0.000000e+00) in frame 855
ERROR: "baum_welch.c", line 333: actor1_034 ignored
utt> 33 actor1_034 857 0 372 330 utt 0.005x 1.183e upd 0.005x 0.991e fwd
0.005x 0.980e bwd 0.000x 0.951e gau 0.000x 0.660e rsts 0.000x 0.000e rstf
0.000x 0.000e rstu 0.000x 0.000e
overall> sparrow 0 (-0) 0.000000e+00 0.000000e+00 0.000x 1.051e
WARNING: "accum.c", line 618: Over 500 senones never occur in the input data.
This is normal for context-dependent untied senone training or for adaptation,
but could indicate a serious problem otherwise.
INFO: s3mixw_io.c(232): Wrote ./mixw_counts
INFO: s3tmat_io.c(174): Wrote ./tmat_counts
INFO: s3gau_io.c(478): Wrote ./gauden_counts with means with vars
INFO: main.c(1040): Counts saved to .

I have an error for every file.
Do you have an idea what causes theses errors ?

After the first error, I have the message :
"# of codebooks in mean/var files, 5725, inconsistent with ts2cb mapping 1"
What does it mean ?

Is my command line correct ?
In particular, I am not sure of :
-ts2cbfn .semi. : I have no .semi. file in the current directory or in the lium_french_f0 directory
-svspec 0-12/13-25/26-38 : it is specified in the example, but not in lium_french_f0/feat.params so I dropped it.

Thanks,

Boris.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-11-10

Is my command line correct ?

Most likely no. Once they will be correct everything will be ok.

In particular, I am not sure of :
-ts2cbfn .semi. : I have no .semi. file in the current directory or in the lium_french_f0 directory

French model is continuous, it should be .cont.

-svspec 0-12/13-25/26-38 : it is specified in the example, but not in
lium_french_f0/feat.params so I dropped it.

Correct, no svspec there

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Boris Mansencal - 2011-11-14

French model is continuous, it should be .cont.

Where can I see that ?

It is ok now, I have no more error. Thanks.

I have then created both MLLR and MAP adaptations, with the following
commands:

./mllr_solve -meanfn lium_french_f0/means -varfn lium_french_f0/variances
-outmllrfn mllr_matrix -accumdir .

./map_adapt -meanfn lium_french_f0/means -varfn lium_french_f0/variances
-mixwfn lium_french_f0/mixture_weights -tmatfn
lium_french_f0/transition_matrices -accumdir . -mapmeanfn
lium_french_f0_adapt/means -mapvarfn lium_french_f0_adapt/variances -mapmixwfn
lium_french_f0_adapt/mixture_weights -maptmatfn
lium_french_f0_adapt/transition_matrices

When I use -mllr mllr_matrix, I improve my recognition results (on a test set
containing adaptation set).
But when I use my adapted acoustic model, results are worse than with the
original model.
If I combine -mllr and my adapted acoustic model, results are better than with
the original model, but still worse than with -mllr alone.

Do you have any idea why it is happening ?

Boris.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-11-15

Where can I see that ?

It's written in a model readme.

Do you have any idea why it is happening ?

MAP has smoothing parameter tau which needs to be optimized. If you want to
understand the process you might want to read more on MLLR and MAP and on
model adaptation.

http://nsh.nexiwave.com/2009/09/adaptation-
methods.html

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Boris Mansencal - 2011-11-15

I have read your blog post and the linked pdf.
In the pdf, the method to combine MAP and MLLR is as follows :
" 1) Compute an MLLR transformation with bw and mllr_solve
2) Apply it to the baseline means with mllr_transform
3) Re-run bw with the transformed means
4) Run map_adapt to produce a MAP re-estimation "

1) I think it is exactly what is described in the tutorial.
I have done :
./bw -hmmdir lium_french_f0 -moddeffn lium_french_f0/mdef.txt -ts2cbfn .cont.
-feat 1s_c_d_dd -cmn current -agc max -dictfn phrases.dic -ctlfn
phrases_files.txt -lsnfn phrases_s_u.txt -accumdir .

./mllr_solve -meanfn lium_french_f0/means -varfn lium_french_f0/variances
-outmllrfn mllr_matrix -accumdir .

2) I suppose this is something like :
cp -a lium_french_f0 lium_french_f0_adaptB
./mllr_transform -ingaucntfn ??? -inmeanfn lium_french_f0/means -mllrmat
mllr_matrix -moddeffn lium_french_f0/mdef.txt -outgaucntfn ??? -outmeanfn
lium_french_f0_adaptB/means

What is the "Input Gaussian accumulation count file name" ?
Is it one of the files produced by bw : mixw_counts, tmat_counts or
gauden_counts ?

3) Is it enough to do :
./bw -hmmdir lium_french_f0_adaptB -moddeffn lium_french_f0_adaptB/mdef.txt
-ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc max -dictfn phrases.dic
-ctlfn phrases_files.txt -lsnfn phrases_s_u.txt -accumdir .
that is just replacing the directory of the model by the adapted one ?

I see that there is a -mllrmat option to bw. Is it useful somewhere ?

4) Is map_adapt still broken (as reported in your blog post in 09/2009) ?
Should i do
./map_adapt -meanfn lium_french_f0/means -varfn lium_french_f0/variances
-mixwfn lium_french_f0/mixture_weights -tmatfn
lium_french_f0/transition_matrices -accumdir . -mapmeanfn
lium_french_f0_adaptB/means -mapvarfn lium_french_f0_adaptB/variances
-mapmixwfn lium_french_f0_adaptB/mixture_weights -maptmatfn
lium_french_f0_adaptB/transition_matrices

or the same with -fixedtau 100 (or greater value) ?

Is there a way to have a clue about which value of tau to use ?

Thanks again,

Boris.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Boris Mansencal - 2011-11-17

Any help on this ?
In particular, on the -ingaucntfn option of mllr_transform.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-11-17

What is the "Input Gaussian accumulation count file name" ?
Is it one of the files produced by bw : mixw_counts, tmat_counts or
gauden_counts ?

It is gauden_counts, but it's not needed. mllr_transform can either transform
model or gauden_counts. You only need to transform means with -inmean and
-outmean.

3) Is it enough to do :
./bw -hmmdir lium_french_f0_adaptB -moddeffn lium_french_f0_adaptB/mdef.txt
-ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc max -dictfn phrases.dic
-ctlfn phrases_files.txt -lsnfn phrases_s_u.txt -accumdir .
that is just replacing the directory of the model by the adapted one ?

Not sure what you mean by "enough" but this step is needed.

I see that there is a -mllrmat option to bw. Is it useful somewhere ?

It's for different purposes

Is map_adapt still broken (as reported in your blog post in 09/2009) ?

It's not broken. It requires manually selected tau.

Is there a way to have a clue about which value of tau to use ?

You can try different values and find the best one.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Boris Mansencal - 2011-11-18

To manually select tau for map_adapt, it is with -fixedtau or -tau ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-11-22

Hello Boris

To manually select tau for map_adapt, it is with -fixedtau or -tau ?

Yes, exactly

-fixedtau yes -tau 100
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Osmoz - 2011-12-07

Hello,

Do you have a French model working with pocketSphinx ?

I try the two lium_french_f0 and lium_french_f2 and no result.

When i speak there is no word recognized...

Did you try it with freeswitch ?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-12-09

Do you have a French model working with pocketSphinx ?

Yes

I try the two lium_french_f0 and lium_french_f2 and no result.

Make sure you downloaded the file from here

http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20M
odels/French%20F2%20Telephone%20Acoustic%20Model/lium_french_f2.tar.gz/downloa
d

When i speak there is no word recognized...

It should work. Try to collect pocketsphinx log and paste it into a new forum
topic. Please don't continue this unrelated discussion here. This thread is
about different thing.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adapting french acoustic model errors

Speech Recognition Toolkit

Forums

Help

Adapting french acoustic model errors document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

of codebooks in mean/var files, 5725, inconsistent with ts2cb mapping 1

Adapting french acoustic model errors