and tested it on an example with a french speaker (see ).
My problem is that the first result given by ps_decode_raw() on the whole file
is far better in term of accuracy
than the second result given by ps_process_raw() on blocks of files.
Actually, on my example, the first result is perfectly exact but the second
result has not one correct word.
I have also tested the pocketsphinx_continuous program and I also get very bad
results on my example file.
I have tried sphinxbase/pocketsphinx both version 0.7 and svn rev 11224, on
linux (Fedora 14 x86_64) on Intel Core2 quad Q9505.
The transcript of this example is:
"On vient leur apprendre que leurs papiers ne sont pas suffisants. "
Here is the output of the hello_ps program :
./hello_ps ../../Batman-20100121-ljg/wav/fr-sb-693_16_16.raw
INFO: cmd_ln.c(691): Parsing command line:
\
-hmm /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0 \
-lm /home/mansencal/MSM/Speech/Sphinx/Models/French/french3g62K.lm.dmp \
-dict /home/mansencal/MSM/Speech/Sphinx/Models/French/frenchWords62K.dic
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /home/mansencal/MSM/Speech/Sphinx/Models/French/frenchWords62K.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm /home/mansencal/MSM/Speech/Sphinx/Models/French/french3g62K.lm.dmp
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
-agc none max
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.333333e+02
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.562500e-02
INFO: acmod.c(246): Parsed model-specific feature parameters from
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/feat.params
INFO: feat.c(684): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13, CMN='current', VARNORM='no', AGC='max'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: agc.c(132): AGCEMax: max= 5.00
INFO: mdef.c(520): Reading model definition:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/mdef
INFO: bin_mdef.c(179): Allocating 85844 * 8 bytes (670 KiB) for CD tree
INFO: tmat.c(205): Reading HMM transition probability matrices: /home/mansenca
l/MSM/Speech/Sphinx/Models/French/lium_french_f0/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/means
INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 22x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/variances
INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 22x39
INFO: ms_gauden.c(354): 3502 variance values floored
INFO: acmod.c(123): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/means
INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 22x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/variances
INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 22x39
INFO: ms_gauden.c(354): 3502 variance values floored
INFO: ptm_mgau.c(800): Number of codebooks exceeds 256: 5725
INFO: acmod.c(125): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/means
INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 22x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/variances
INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 22x39
INFO: ms_gauden.c(354): 3502 variance values floored
INFO: ms_senone.c(160): Reading senone mixture weights:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/mixture_weights
INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(218): Not transposing mixture weights in memory
INFO: ms_senone.c(277): Read mixture weights for 5725 senones: 1 features x 22
codewords
INFO: ms_senone.c(331): Mapping senones to individual codebooks
INFO: ms_mgau.c(141): The value of topn: 4
INFO: dict.c(308): Allocating 109107 * 32 bytes (3409 KiB) for word entries
INFO: dict.c(323): Reading main dictionary:
/home/mansencal/MSM/Speech/Sphinx/Models/French/frenchWords62K.dic
INFO: dict.c(212): Allocated 1018 KiB for strings, 1375 KiB for phones
INFO: dict.c(326): 105003 words read
INFO: dict.c(332): Reading filler dictionary:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/noisedict
INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(335): 8 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 45^3 * 2 bytes (177 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 48960 bytes (47 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 48960 bytes (47 KiB) for single-phone word
triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=62304, 2=18541132, 3=23627127
INFO: ngram_model_dmp.c(242): 62304 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(291): 18541132 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(317): 23627127 = LM.trigrams read
INFO: ngram_model_dmp.c(342): 37843 = LM.prob2 entries read
INFO: ngram_model_dmp.c(362): 5753 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(382): 35967 = LM.prob3 entries read
INFO: ngram_model_dmp.c(410): 36214 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(466): 62304 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 742 unique initial diphones
WARNING: "ngram_search_fwdtree.c", line 111: Filler word 105010 = has more
than one phone, ignoring it.
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 136 single-
phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 136
single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 128180
INFO: ngram_search_fwdtree.c(338): after: 742 root, 128052 non-root channels,
134 single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn.c(175): CMN: 7.66 -0.29 -0.05 0.06 -0.23 -0.11 -0.16 -0.09 -0.14
-0.12 -0.08 -0.07 -0.12
INFO: agc.c(123): AGCMax: obs=max= 6.00
INFO: ngram_search.c(466): Resized backpointer table to 10000 entries
INFO: ngram_search_fwdtree.c(1549): 5703 words recognized (12/fr)
INFO: ngram_search_fwdtree.c(1551): 1655774 senones evaluated (3400/fr)
INFO: ngram_search_fwdtree.c(1553): 2108248 channels searched (4329/fr),
336257 1st, 149665 last
INFO: ngram_search_fwdtree.c(1557): 34335 words for which last channels
evaluated (70/fr)
INFO: ngram_search_fwdtree.c(1560): 257076 candidate words for entering last
phone (527/fr)
INFO: ngram_search_fwdtree.c(1562): fwdtree 4.24 CPU 0.870 xRT
INFO: ngram_search_fwdtree.c(1565): fwdtree 4.24 wall 0.871 xRT
INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 273 words
INFO: ngram_search_fwdflat.c(940): 3500 words recognized (7/fr)
INFO: ngram_search_fwdflat.c(942): 183201 senones evaluated (376/fr)
INFO: ngram_search_fwdflat.c(944): 139841 channels searched (287/fr)
INFO: ngram_search_fwdflat.c(946): 20354 words searched (41/fr)
INFO: ngram_search_fwdflat.c(948): 15642 word transitions (32/fr)
INFO: ngram_search_fwdflat.c(951): fwdflat 0.45 CPU 0.093 xRT
INFO: ngram_search_fwdflat.c(954): fwdflat 0.45 wall 0.093 xRT
INFO: ngram_search.c(1206): not found in last frame, using .485 instead
INFO: ngram_search.c(1258): lattice start node .0 end node .338
INFO: ngram_search.c(1286): Eliminated 248 nodes before end node
INFO: ngram_search.c(1391): Lattice has 1020 nodes, 2056 links
INFO: ps_lattice.c(1365): Normalizer P(O) = alpha(**:338:485) = -1075041
INFO: ps_lattice.c(1403): Joint P(O,S) = -1163653 P(S|O) = -88612
INFO: ngram_search.c(880): bestpath 0.01 CPU 0.002 xRT
INFO: ngram_search.c(883): bestpath 0.01 wall 0.002 xRT
Recognized: on vient leur apprendre que leurs papiers ne sont pas suffisants
INFO: ngram_search.c(474): Resized score stack to 200000 entries
INFO: ngram_search.c(466): Resized backpointer table to 20000 entries
INFO: cmn_prior.c(121): cmn_prior_update: from < 7.66 -0.29 -0.05 0.06 -0.23
-0.11 -0.16 -0.09 -0.14 -0.12 -0.08 -0.07 -0.12 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 7.66 -0.29 -0.05 0.06 -0.23
-0.11 -0.16 -0.09 -0.14 -0.12 -0.08 -0.07 -0.12 >
INFO: agc.c(172): AGCEMax: obs= 6.00, new= 6.00
INFO: ngram_search_fwdtree.c(1549): 13366 words recognized (27/fr)
INFO: ngram_search_fwdtree.c(1551): 2218221 senones evaluated (4555/fr)
INFO: ngram_search_fwdtree.c(1553): 4512364 channels searched (9265/fr),
353201 1st, 217991 last
INFO: ngram_search_fwdtree.c(1557): 44684 words for which last channels
evaluated (91/fr)
INFO: ngram_search_fwdtree.c(1560): 615087 candidate words for entering last
phone (1263/fr)
INFO: ngram_search_fwdtree.c(1562): fwdtree 6.56 CPU 1.346 xRT
INFO: ngram_search_fwdtree.c(1565): fwdtree 6.56 wall 1.348 xRT
INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 449 words
INFO: ngram_search_fwdflat.c(940): 8381 words recognized (17/fr)
INFO: ngram_search_fwdflat.c(942): 418430 senones evaluated (859/fr)
INFO: ngram_search_fwdflat.c(944): 364487 channels searched (748/fr)
INFO: ngram_search_fwdflat.c(946): 41997 words searched (86/fr)
INFO: ngram_search_fwdflat.c(948): 29711 word transitions (61/fr)
INFO: ngram_search_fwdflat.c(951): fwdflat 1.11 CPU 0.227 xRT
INFO: ngram_search_fwdflat.c(954): fwdflat 1.11 wall 0.227 xRT
INFO: ngram_search.c(1206): not found in last frame, using .485 instead
INFO: ngram_search.c(1258): lattice start node .0 end node .334
INFO: ngram_search.c(1286): Eliminated 182 nodes before end node
INFO: ngram_search.c(1391): Lattice has 1285 nodes, 13790 links
INFO: ps_lattice.c(1365): Normalizer P(O) = alpha(:334:485) = -1559473
INFO: ps_lattice.c(1403): Joint P(O,S) = -1595064 P(S|O) = -35591
INFO: ngram_search.c(880): bestpath 0.98 CPU 0.202 xRT
INFO: ngram_search.c(883): bestpath 0.98 wall 0.202 xRT
Recognized: elle entend que dans les_autres
INFO: ngram_search_fwdtree.c(430): TOTAL fwdtree 10.79 CPU 1.110 xRT
INFO: ngram_search_fwdtree.c(433): TOTAL fwdtree 10.81 wall 1.112 xRT
INFO: ngram_search_fwdflat.c(174): TOTAL fwdflat 1.56 CPU 0.160 xRT
INFO: ngram_search_fwdflat.c(177): TOTAL fwdflat 1.56 wall 0.160 xRT
INFO: ngram_search.c(317): TOTAL bestpath 0.99 CPU 0.102 xRT
INFO: ngram_search.c(320): TOTAL bestpath 0.99 wall 0.102 xRT
Thanks a lot for your help,
Boris.**
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
FYI, when I launch the program I see :
-agc none none
-agcthresh 2.0 2.000000e+00
-agc none max
-agcthresh 2.0 2.000000e+00
Could we improve our results in online mode by changing these parameters ?
I have compiled with -DDUMP_FEATURES to have features values.
For the "all in memory" mode, I get :
INFO: feat.c(149): After CMN
-2.261833 -0.383301 -0.151923 -0.218711 0.113252 0.050123 0.221852 0.013434 -0.020369 0.095906 0.072756 0.089094 0.178019
-2.151785 -0.207930 0.077055 -0.094817 0.158462 0.066685 -0.017394 -0.014809 0.082939 0.027884 0.159237 0.030688 0.116601
So yes values seem quite different after AGC between the two version. That's
the AGC bug you are talking about ?
Is there a way to alleviate this problem ?
Besides, I have also read on Sphinx3 FAQ : http://www.speech.cs.cmu.edu/sphinxman/FAQ.html
that mismatch in AGC settings between training and decode could explain bad
results.
I don't know if it still applies to pocket sphinx
How do I know how with which AGC settings was trained available french model ?
Thanks again.
Boris.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am having a closer look at sphinxbase code, and in particular AGC (in svn
rev 11257).
First, in src/libsphinxbase/feat/agc.c : agc_emax() function :
Why is the index in the for loop starting from 1 ? Is it a bug ?
It seems I get better results (on some french examples) with an index starting
from 0
Secondly, in src/libsphinxbase/feat/feat.c : in feat_init() function, I see
that agc->max is initialized by a hardwired value :
/ HACK: hardwired initial estimates based on use of CMN (from Sphinx2) /
agc_emax_set(fcb->agc_struct, (cmn != CMN_NONE) ? 5.0 : 10.0);
If I understand the code correctly, this hardwired value will be used only for
the first utterance (but for all buffers/calls to ps_process_raw for this
first uterrance).
Would it be a good idea to update agc->max each time agc_emax() is called on
the first uterrance ?
For example, it could be by adding the following code in agc_emax() before the
for loop :
if (agc->obs_utt == 0) {
mfcc_t new_max = mfc;
for (i = 1; i < n_frame; ++i)
if (mfc_ > agc->max)
new_max = mfc_;
if (new_max > agc->max) {
agc->max = new_max;
E_INFO("AGCEMax: max= %.2f\n", agc->max);
}
}
Boris.
__
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, something like that does make sense. Maybe it could be less specific code
but if it works for you and you tested and you have a patch please publish
it and we will include it into source tree and go further!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
if (n_frame <= 0)
return;
- for (i = 1; i < n_frame; ++i) {
+ for (i = 0; i < n_frame; ++i) {
if (mfc_ > agc->obs_max) {
agc->obs_max = mfc_;
agc->obs_frame = 1;
Boris.__
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
That should be enough, thanks a lot. I still would appreciate if you could
test it too. Because it's quite complex for me to setup french model, setup
test, verify it, etc. :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You should be able to test it with any other model trained with an AGC set to
max.
It seems that default English models were trained with AGC set to none, that's
why you can not easily test it, isn't it ?
If there is no sound reason why this loop starts from 1, it should be a bug.
On the French examples I have tested, this change always improve recognition
results.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I am a total newbie regarding speech recognition and pocketsphinx.
I have just tried the example described here: http://cmusphinx.sourceforge.ne
t/wiki/tuturialpocketsphinx
but using the french model/dictionary provided here: http://sourceforge.net/p
rojects/cmusphinx/files/Acoustic%20and%20Language%20Models/
I replaced the cmd_ln_init() call with something like that :
config = cmd_ln_init(NULL, ps_args(), TRUE,
"-hmm", ".../lium_french_f0",
"-lm", ".../french3g62K.lm.dmp",
"-dict", ".../frenchWords62K.dic",
NULL);
and tested it on an example with a french speaker (see ).
My problem is that the first result given by ps_decode_raw() on the whole file
is far better in term of accuracy
than the second result given by ps_process_raw() on blocks of files.
Actually, on my example, the first result is perfectly exact but the second
result has not one correct word.
I have also tested the pocketsphinx_continuous program and I also get very bad
results on my example file.
I have tried sphinxbase/pocketsphinx both version 0.7 and svn rev 11224, on
linux (Fedora 14 x86_64) on Intel Core2 quad Q9505.
What could explain such a huge difference ?
I get the example file with the following commands:
wget http://www.repository.voxforge1.org/downloads/fr/Trunk/Audio/Original/48
kHz_16bit/Batman-20100121-ljg.tgz
tar xzf Batman-20100121-ljg.tgz
sox Batman-20100121-ljg/wav/fr-sb-693.wav -r 16000 -b 16
Batman-20100121-ljg/wav/fr-sb-693_16_16.raw rate 16k
The transcript of this example is:
"On vient leur apprendre que leurs papiers ne sont pas suffisants. "
Here is the output of the hello_ps program :
./hello_ps ../../Batman-20100121-ljg/wav/fr-sb-693_16_16.raw
INFO: cmd_ln.c(691): Parsing command line:
\
-hmm /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0 \
-lm /home/mansencal/MSM/Speech/Sphinx/Models/French/french3g62K.lm.dmp \
-dict /home/mansencal/MSM/Speech/Sphinx/Models/French/frenchWords62K.dic
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /home/mansencal/MSM/Speech/Sphinx/Models/French/frenchWords62K.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm /home/mansencal/MSM/Speech/Sphinx/Models/French/french3g62K.lm.dmp
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(691): Parsing command line:
\
-feat 1s_c_d_dd \
-agc max \
-cmn current \
-varnorm no \
-samprate 16000 \
-lowerf 133.33334 \
-upperf 6855.49756 \
-nfilt 40 \
-nfft 512
Current configuration:
-agc none max
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.333333e+02
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.562500e-02
INFO: acmod.c(246): Parsed model-specific feature parameters from
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/feat.params
INFO: feat.c(684): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13, CMN='current', VARNORM='no', AGC='max'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: agc.c(132): AGCEMax: max= 5.00
INFO: mdef.c(520): Reading model definition:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/mdef
INFO: bin_mdef.c(179): Allocating 85844 * 8 bytes (670 KiB) for CD tree
INFO: tmat.c(205): Reading HMM transition probability matrices: /home/mansenca
l/MSM/Speech/Sphinx/Models/French/lium_french_f0/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/means
INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 22x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/variances
INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 22x39
INFO: ms_gauden.c(354): 3502 variance values floored
INFO: acmod.c(123): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/means
INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 22x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/variances
INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 22x39
INFO: ms_gauden.c(354): 3502 variance values floored
INFO: ptm_mgau.c(800): Number of codebooks exceeds 256: 5725
INFO: acmod.c(125): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/means
INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 22x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/variances
INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 22x39
INFO: ms_gauden.c(354): 3502 variance values floored
INFO: ms_senone.c(160): Reading senone mixture weights:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/mixture_weights
INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(218): Not transposing mixture weights in memory
INFO: ms_senone.c(277): Read mixture weights for 5725 senones: 1 features x 22
codewords
INFO: ms_senone.c(331): Mapping senones to individual codebooks
INFO: ms_mgau.c(141): The value of topn: 4
INFO: dict.c(308): Allocating 109107 * 32 bytes (3409 KiB) for word entries
INFO: dict.c(323): Reading main dictionary:
/home/mansencal/MSM/Speech/Sphinx/Models/French/frenchWords62K.dic
INFO: dict.c(212): Allocated 1018 KiB for strings, 1375 KiB for phones
INFO: dict.c(326): 105003 words read
INFO: dict.c(332): Reading filler dictionary:
/home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/noisedict
INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(335): 8 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 45^3 * 2 bytes (177 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 48960 bytes (47 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 48960 bytes (47 KiB) for single-phone word
triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=62304, 2=18541132, 3=23627127
INFO: ngram_model_dmp.c(242): 62304 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(291): 18541132 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(317): 23627127 = LM.trigrams read
INFO: ngram_model_dmp.c(342): 37843 = LM.prob2 entries read
INFO: ngram_model_dmp.c(362): 5753 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(382): 35967 = LM.prob3 entries read
INFO: ngram_model_dmp.c(410): 36214 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(466): 62304 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 742 unique initial diphones
WARNING: "ngram_search_fwdtree.c", line 111: Filler word 105010 = has more
than one phone, ignoring it.
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 136 single-
phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 136
single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 128180
INFO: ngram_search_fwdtree.c(338): after: 742 root, 128052 non-root channels,
134 single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn.c(175): CMN: 7.66 -0.29 -0.05 0.06 -0.23 -0.11 -0.16 -0.09 -0.14
-0.12 -0.08 -0.07 -0.12
INFO: agc.c(123): AGCMax: obs=max= 6.00
INFO: ngram_search.c(466): Resized backpointer table to 10000 entries
INFO: ngram_search_fwdtree.c(1549): 5703 words recognized (12/fr)
INFO: ngram_search_fwdtree.c(1551): 1655774 senones evaluated (3400/fr)
INFO: ngram_search_fwdtree.c(1553): 2108248 channels searched (4329/fr),
336257 1st, 149665 last
INFO: ngram_search_fwdtree.c(1557): 34335 words for which last channels
evaluated (70/fr)
INFO: ngram_search_fwdtree.c(1560): 257076 candidate words for entering last
phone (527/fr)
INFO: ngram_search_fwdtree.c(1562): fwdtree 4.24 CPU 0.870 xRT
INFO: ngram_search_fwdtree.c(1565): fwdtree 4.24 wall 0.871 xRT
INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 273 words
INFO: ngram_search_fwdflat.c(940): 3500 words recognized (7/fr)
INFO: ngram_search_fwdflat.c(942): 183201 senones evaluated (376/fr)
INFO: ngram_search_fwdflat.c(944): 139841 channels searched (287/fr)
INFO: ngram_search_fwdflat.c(946): 20354 words searched (41/fr)
INFO: ngram_search_fwdflat.c(948): 15642 word transitions (32/fr)
INFO: ngram_search_fwdflat.c(951): fwdflat 0.45 CPU 0.093 xRT
INFO: ngram_search_fwdflat.c(954): fwdflat 0.45 wall 0.093 xRT
INFO: ngram_search.c(1206): not found in last frame, using .485 instead
INFO: ngram_search.c(1258): lattice start node
.0 end node.338not found in last frame, using .485 insteadINFO: ngram_search.c(1286): Eliminated 248 nodes before end node
INFO: ngram_search.c(1391): Lattice has 1020 nodes, 2056 links
INFO: ps_lattice.c(1365): Normalizer P(O) = alpha(**:338:485) = -1075041
INFO: ps_lattice.c(1403): Joint P(O,S) = -1163653 P(S|O) = -88612
INFO: ngram_search.c(880): bestpath 0.01 CPU 0.002 xRT
INFO: ngram_search.c(883): bestpath 0.01 wall 0.002 xRT
Recognized: on vient leur apprendre que leurs papiers ne sont pas suffisants
INFO: ngram_search.c(474): Resized score stack to 200000 entries
INFO: ngram_search.c(466): Resized backpointer table to 20000 entries
INFO: cmn_prior.c(121): cmn_prior_update: from < 7.66 -0.29 -0.05 0.06 -0.23
-0.11 -0.16 -0.09 -0.14 -0.12 -0.08 -0.07 -0.12 >
INFO: cmn_prior.c(139): cmn_prior_update: to < 7.66 -0.29 -0.05 0.06 -0.23
-0.11 -0.16 -0.09 -0.14 -0.12 -0.08 -0.07 -0.12 >
INFO: agc.c(172): AGCEMax: obs= 6.00, new= 6.00
INFO: ngram_search_fwdtree.c(1549): 13366 words recognized (27/fr)
INFO: ngram_search_fwdtree.c(1551): 2218221 senones evaluated (4555/fr)
INFO: ngram_search_fwdtree.c(1553): 4512364 channels searched (9265/fr),
353201 1st, 217991 last
INFO: ngram_search_fwdtree.c(1557): 44684 words for which last channels
evaluated (91/fr)
INFO: ngram_search_fwdtree.c(1560): 615087 candidate words for entering last
phone (1263/fr)
INFO: ngram_search_fwdtree.c(1562): fwdtree 6.56 CPU 1.346 xRT
INFO: ngram_search_fwdtree.c(1565): fwdtree 6.56 wall 1.348 xRT
INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 449 words
INFO: ngram_search_fwdflat.c(940): 8381 words recognized (17/fr)
INFO: ngram_search_fwdflat.c(942): 418430 senones evaluated (859/fr)
INFO: ngram_search_fwdflat.c(944): 364487 channels searched (748/fr)
INFO: ngram_search_fwdflat.c(946): 41997 words searched (86/fr)
INFO: ngram_search_fwdflat.c(948): 29711 word transitions (61/fr)
INFO: ngram_search_fwdflat.c(951): fwdflat 1.11 CPU 0.227 xRT
INFO: ngram_search_fwdflat.c(954): fwdflat 1.11 wall 0.227 xRT
INFO: ngram_search.c(1206):
INFO: ngram_search.c(1258): lattice start node
.0 end node .334INFO: ngram_search.c(1286): Eliminated 182 nodes before end node
INFO: ngram_search.c(1391): Lattice has 1285 nodes, 13790 links
INFO: ps_lattice.c(1365): Normalizer P(O) = alpha(:334:485) = -1559473
INFO: ps_lattice.c(1403): Joint P(O,S) = -1595064 P(S|O) = -35591
INFO: ngram_search.c(880): bestpath 0.98 CPU 0.202 xRT
INFO: ngram_search.c(883): bestpath 0.98 wall 0.202 xRT
Recognized: elle entend que dans les_autres
INFO: ngram_search_fwdtree.c(430): TOTAL fwdtree 10.79 CPU 1.110 xRT
INFO: ngram_search_fwdtree.c(433): TOTAL fwdtree 10.81 wall 1.112 xRT
INFO: ngram_search_fwdflat.c(174): TOTAL fwdflat 1.56 CPU 0.160 xRT
INFO: ngram_search_fwdflat.c(177): TOTAL fwdflat 1.56 wall 0.160 xRT
INFO: ngram_search.c(317): TOTAL bestpath 0.99 CPU 0.102 xRT
INFO: ngram_search.c(320): TOTAL bestpath 0.99 wall 0.102 xRT
Thanks a lot for your help,
Boris.**anyone ?
Most likely feature normalization went wrong. Print feature values and
compare.
Could you please elaborate ? How do I get and print feature values ?
I have also found that increasing buffer size (logically) improves accuracy...
Yes, most likey it's AGC bug
To print features see the following condition in sphinxbase:
AGC is Automatic Gain Control, right ?
Could you explain what the bug is ? Or is there a bug tracking system witth
this bug somewhere ?
Is someone working in correcting this bug ?
Thank you,
Boris.
Yes
agc value is not calculated properly in online mode
No
No
FYI, when I launch the program I see :
-agc none none
-agcthresh 2.0 2.000000e+00
-agc none max
-agcthresh 2.0 2.000000e+00
Could we improve our results in online mode by changing these parameters ?
I have compiled with -DDUMP_FEATURES to have features values.
For the "all in memory" mode, I get :
INFO: feat.c(149): After CMN
-2.261833 -0.383301 -0.151923 -0.218711 0.113252 0.050123 0.221852 0.013434 -0.020369 0.095906 0.072756 0.089094 0.178019
-2.151785 -0.207930 0.077055 -0.094817 0.158462 0.066685 -0.017394 -0.014809 0.082939 0.027884 0.159237 0.030688 0.116601
INFO: agc.c(123): AGCMax: obs=max= 6.00
INFO: feat.c(149): After AGC
-8.262085 -0.383301 -0.151923 -0.218711 0.113252 0.050123 0.221852 0.013434 -0.020369 0.095906 0.072756 0.089094 0.178019
-8.152037 -0.207930 0.077055 -0.094817 0.158462 0.066685 -0.017394 -0.014809 0.082939 0.027884 0.159237 0.030688 0.116601
INFO: feat.c(149): Incoming features (after padding)
-8.262085 -0.383301 -0.151923 -0.218711 0.113252 0.050123 0.221852 0.013434 -0.020369 0.095906 0.072756 0.089094 0.178019
-8.262085 -0.383301 -0.151923 -0.218711 0.113252 0.050123 0.221852 0.013434 -0.020369 0.095906 0.072756 0.089094 0.178019
For the online version, I get :
INFO: feat.c(149): After CMN
-2.261833 -0.383301 -0.151923 -0.218711 0.113252 0.050123 0.221852 0.013434 -0.020369 0.095906 0.072756 0.089094 0.178019
INFO: feat.c(149): After AGC
-2.261833 -0.383301 -0.151923 -0.218711 0.113252 0.050123 0.221852 0.013434 -0.020369 0.095906 0.072756 0.089094 0.178019
INFO: feat.c(149): After CMN
-2.151785 -0.207930 0.077055 -0.094817 0.158462 0.066685 -0.017394 -0.014809 0.082939 0.027884 0.159237 0.030688 0.116601
-2.176410 -0.229570 -0.052823 -0.200628 0.218424 0.047539 0.131155 0.020091 0.122641 -0.064158 -0.099204 -0.046863 -0.007805
-2.401895 -0.371993 -0.016168 -0.061217 0.170805 -0.032357 0.108044 -0.130535 0.000032 0.103746 -0.063559 -0.015458 0.013658
INFO: feat.c(149): After AGC
-2.151785 -0.207930 0.077055 -0.094817 0.158462 0.066685 -0.017394 -0.014809 0.082939 0.027884 0.159237 0.030688 0.116601
-7.176410 -0.229570 -0.052823 -0.200628 0.218424 0.047539 0.131155 0.020091 0.122641 -0.064158 -0.099204 -0.046863 -0.007805
-7.401895 -0.371993 -0.016168 -0.061217 0.170805 -0.032357 0.108044 -0.130535 0.000032 0.103746 -0.063559 -0.015458 0.013658
So yes values seem quite different after AGC between the two version. That's
the AGC bug you are talking about ?
Is there a way to alleviate this problem ?
Besides, I have also read on Sphinx3 FAQ :
http://www.speech.cs.cmu.edu/sphinxman/FAQ.html
that mismatch in AGC settings between training and decode could explain bad
results.
I don't know if it still applies to pocket sphinx
How do I know how with which AGC settings was trained available french model ?
Thanks again.
Boris.
Yes
One need to fix sphinxbase agc code to properly initialize AGC and estimate in
in online mode. For example to set initial value to -6.
It's pointed in README and in feat.params file inside the model archive.
I am having a closer look at sphinxbase code, and in particular AGC (in svn
rev 11257).
First, in src/libsphinxbase/feat/agc.c : agc_emax() function :
Why is the index in the for loop starting from 1 ? Is it a bug ?
It seems I get better results (on some french examples) with an index starting
from 0
Secondly, in src/libsphinxbase/feat/feat.c : in feat_init() function, I see
that agc->max is initialized by a hardwired value :
/ HACK: hardwired initial estimates based on use of CMN (from Sphinx2) /
agc_emax_set(fcb->agc_struct, (cmn != CMN_NONE) ? 5.0 : 10.0);
If I understand the code correctly, this hardwired value will be used only for
the first utterance (but for all buffers/calls to ps_process_raw for this
first uterrance).
Would it be a good idea to update agc->max each time agc_emax() is called on
the first uterrance ?
For example, it could be by adding the following code in agc_emax() before the
for loop :
if (agc->obs_utt == 0) {
mfcc_t new_max = mfc;
for (i = 1; i < n_frame; ++i)
if (mfc_ > agc->max)
new_max = mfc_;
if (new_max > agc->max) {
agc->max = new_max;
E_INFO("AGCEMax: max= %.2f\n", agc->max);
}
}
Boris.
__
Any tought on previous message ?
In particular, on the loop starting from 1 in agc_emax() ?
Thanks,
Boris.
Hi Boris
Yes, something like that does make sense. Maybe it could be less specific code
but if it works for you and you tested and you have a patch please publish
it and we will include it into source tree and go further!
I have not tested enough the dynamic update of agc->max for the first
uterrance yet to be absolutely sure it improves things.
However, my first remark about the loop still holds.
I don't know how to send you a patch.
Here is a svn diff from sphinxbase svn :
Index: src/libsphinxbase/feat/agc.c
--- src/libsphinxbase/feat/agc.c (revision 11258)
+++ src/libsphinxbase/feat/agc.c (working copy)
@@ -145,7 +145,7 @@
if (n_frame <= 0)
return;
- for (i = 1; i < n_frame; ++i) {
+ for (i = 0; i < n_frame; ++i) {
if (mfc_ > agc->obs_max) {
agc->obs_max = mfc_;
agc->obs_frame = 1;
Boris.__
That should be enough, thanks a lot. I still would appreciate if you could
test it too. Because it's quite complex for me to setup french model, setup
test, verify it, etc. :)
You should be able to test it with any other model trained with an AGC set to
max.
It seems that default English models were trained with AGC set to none, that's
why you can not easily test it, isn't it ?
If there is no sound reason why this loop starts from 1, it should be a bug.
On the French examples I have tested, this change always improve recognition
results.