Menu

ps_decode_raw vs ps_process_raw accuracy ?

Help
2011-09-28
2012-09-22
  • Boris Mansencal

    Boris Mansencal - 2011-09-28

    Hello,

    I am a total newbie regarding speech recognition and pocketsphinx.

    I have just tried the example described here: http://cmusphinx.sourceforge.ne
    t/wiki/tuturialpocketsphinx

    but using the french model/dictionary provided here: http://sourceforge.net/p
    rojects/cmusphinx/files/Acoustic%20and%20Language%20Models/

    I replaced the cmd_ln_init() call with something like that :
    config = cmd_ln_init(NULL, ps_args(), TRUE,
    "-hmm", ".../lium_french_f0",
    "-lm", ".../french3g62K.lm.dmp",
    "-dict", ".../frenchWords62K.dic",
    NULL);

    and tested it on an example with a french speaker (see ).

    My problem is that the first result given by ps_decode_raw() on the whole file
    is far better in term of accuracy
    than the second result given by ps_process_raw() on blocks of files.
    Actually, on my example, the first result is perfectly exact but the second
    result has not one correct word.
    I have also tested the pocketsphinx_continuous program and I also get very bad
    results on my example file.

    I have tried sphinxbase/pocketsphinx both version 0.7 and svn rev 11224, on
    linux (Fedora 14 x86_64) on Intel Core2 quad Q9505.

    What could explain such a huge difference ?

    I get the example file with the following commands:
    wget http://www.repository.voxforge1.org/downloads/fr/Trunk/Audio/Original/48
    kHz_16bit/Batman-20100121-ljg.tgz

    tar xzf Batman-20100121-ljg.tgz
    sox Batman-20100121-ljg/wav/fr-sb-693.wav -r 16000 -b 16
    Batman-20100121-ljg/wav/fr-sb-693_16_16.raw rate 16k

    The transcript of this example is:
    "On vient leur apprendre que leurs papiers ne sont pas suffisants. "

    Here is the output of the hello_ps program :
    ./hello_ps ../../Batman-20100121-ljg/wav/fr-sb-693_16_16.raw
    INFO: cmd_ln.c(691): Parsing command line:
    \
    -hmm /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0 \
    -lm /home/mansencal/MSM/Speech/Sphinx/Models/French/french3g62K.lm.dmp \
    -dict /home/mansencal/MSM/Speech/Sphinx/Models/French/frenchWords62K.dic

    Current configuration:

    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -bghist no no
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict /home/mansencal/MSM/Speech/Sphinx/Models/French/frenchWords62K.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0
    -input_endian little little
    -jsgf
    -kdmaxbbi -1 -1
    -kdmaxdepth 0 0
    -kdtree
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lextreedump 0 0
    -lifter 0 0
    -lm /home/mansencal/MSM/Speech/Sphinx/Models/French/french3g62K.lm.dmp
    -lmctl
    -lmname default default
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf -1 -1
    -maxnewoov 20 20
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-5 1.000000e-05
    -pl_window 0 0
    -rawlogdir
    -remove_dc no no
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -usewdphones no no
    -uw 1.0 1.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    INFO: cmd_ln.c(691): Parsing command line:
    \
    -feat 1s_c_d_dd \
    -agc max \
    -cmn current \
    -varnorm no \
    -samprate 16000 \
    -lowerf 133.33334 \
    -upperf 6855.49756 \
    -nfilt 40 \
    -nfft 512

    Current configuration:

    -agc none max
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -dither no no
    -doublebw no no
    -feat 1s_c_d_dd 1s_c_d_dd
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 0
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -remove_dc no no
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -smoothspec no no
    -svspec
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.562500e-02

    INFO: acmod.c(246): Parsed model-specific feature parameters from
    /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/feat.params
    INFO: feat.c(684): Initializing feature stream to type: '1s_c_d_dd',
    ceplen=13, CMN='current', VARNORM='no', AGC='max'
    INFO: cmn.c(142): mean= 12.00, mean= 0.0
    INFO: agc.c(132): AGCEMax: max= 5.00
    INFO: mdef.c(520): Reading model definition:
    /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/mdef
    INFO: bin_mdef.c(179): Allocating 85844 * 8 bytes (670 KiB) for CD tree
    INFO: tmat.c(205): Reading HMM transition probability matrices: /home/mansenca
    l/MSM/Speech/Sphinx/Models/French/lium_french_f0/transition_matrices
    INFO: acmod.c(121): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/means
    INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 22x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/variances
    INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 22x39
    INFO: ms_gauden.c(354): 3502 variance values floored
    INFO: acmod.c(123): Attempting to use PTHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/means
    INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 22x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/variances
    INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 22x39
    INFO: ms_gauden.c(354): 3502 variance values floored
    INFO: ptm_mgau.c(800): Number of codebooks exceeds 256: 5725
    INFO: acmod.c(125): Falling back to general multi-stream GMM computation
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/means
    INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 22x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/variances
    INFO: ms_gauden.c(292): 5725 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 22x39
    INFO: ms_gauden.c(354): 3502 variance values floored
    INFO: ms_senone.c(160): Reading senone mixture weights:
    /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/mixture_weights
    INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
    INFO: ms_senone.c(218): Not transposing mixture weights in memory
    INFO: ms_senone.c(277): Read mixture weights for 5725 senones: 1 features x 22
    codewords
    INFO: ms_senone.c(331): Mapping senones to individual codebooks
    INFO: ms_mgau.c(141): The value of topn: 4
    INFO: dict.c(308): Allocating 109107 * 32 bytes (3409 KiB) for word entries
    INFO: dict.c(323): Reading main dictionary:
    /home/mansencal/MSM/Speech/Sphinx/Models/French/frenchWords62K.dic
    INFO: dict.c(212): Allocated 1018 KiB for strings, 1375 KiB for phones
    INFO: dict.c(326): 105003 words read
    INFO: dict.c(332): Reading filler dictionary:
    /home/mansencal/MSM/Speech/Sphinx/Models/French/lium_french_f0/noisedict
    INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(335): 8 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(404): Allocating 45^3 * 2 bytes (177 KiB) for word-initial
    triphones
    INFO: dict2pid.c(131): Allocated 48960 bytes (47 KiB) for word-final triphones
    INFO: dict2pid.c(195): Allocated 48960 bytes (47 KiB) for single-phone word
    triphones
    INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
    INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(196): ngrams 1=62304, 2=18541132, 3=23627127
    INFO: ngram_model_dmp.c(242): 62304 = LM.unigrams(+trailer) read
    INFO: ngram_model_dmp.c(291): 18541132 = LM.bigrams(+trailer) read
    INFO: ngram_model_dmp.c(317): 23627127 = LM.trigrams read
    INFO: ngram_model_dmp.c(342): 37843 = LM.prob2 entries read
    INFO: ngram_model_dmp.c(362): 5753 = LM.bo_wt2 entries read
    INFO: ngram_model_dmp.c(382): 35967 = LM.prob3 entries read
    INFO: ngram_model_dmp.c(410): 36214 = LM.tseg_base entries read
    INFO: ngram_model_dmp.c(466): 62304 = ascii word strings read
    INFO: ngram_search_fwdtree.c(99): 742 unique initial diphones
    WARNING: "ngram_search_fwdtree.c", line 111: Filler word 105010 = has more
    than one phone, ignoring it.
    INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 136 single-
    phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 136
    single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 128180
    INFO: ngram_search_fwdtree.c(338): after: 742 root, 128052 non-root channels,
    134 single-phone words
    INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: cmn.c(175): CMN: 7.66 -0.29 -0.05 0.06 -0.23 -0.11 -0.16 -0.09 -0.14
    -0.12 -0.08 -0.07 -0.12
    INFO: agc.c(123): AGCMax: obs=max= 6.00
    INFO: ngram_search.c(466): Resized backpointer table to 10000 entries
    INFO: ngram_search_fwdtree.c(1549): 5703 words recognized (12/fr)
    INFO: ngram_search_fwdtree.c(1551): 1655774 senones evaluated (3400/fr)
    INFO: ngram_search_fwdtree.c(1553): 2108248 channels searched (4329/fr),
    336257 1st, 149665 last
    INFO: ngram_search_fwdtree.c(1557): 34335 words for which last channels
    evaluated (70/fr)
    INFO: ngram_search_fwdtree.c(1560): 257076 candidate words for entering last
    phone (527/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 4.24 CPU 0.870 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 4.24 wall 0.871 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 273 words
    INFO: ngram_search_fwdflat.c(940): 3500 words recognized (7/fr)
    INFO: ngram_search_fwdflat.c(942): 183201 senones evaluated (376/fr)
    INFO: ngram_search_fwdflat.c(944): 139841 channels searched (287/fr)
    INFO: ngram_search_fwdflat.c(946): 20354 words searched (41/fr)
    INFO: ngram_search_fwdflat.c(948): 15642 word transitions (32/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.45 CPU 0.093 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.45 wall 0.093 xRT
    INFO: ngram_search.c(1206): not found in last frame, using .485 instead
    INFO: ngram_search.c(1258): lattice start node .0 end node
    .338
    INFO: ngram_search.c(1286): Eliminated 248 nodes before end node
    INFO: ngram_search.c(1391): Lattice has 1020 nodes, 2056 links
    INFO: ps_lattice.c(1365): Normalizer P(O) = alpha(**:338:485) = -1075041
    INFO: ps_lattice.c(1403): Joint P(O,S) = -1163653 P(S|O) = -88612
    INFO: ngram_search.c(880): bestpath 0.01 CPU 0.002 xRT
    INFO: ngram_search.c(883): bestpath 0.01 wall 0.002 xRT
    Recognized: on vient leur apprendre que leurs papiers ne sont pas suffisants
    INFO: ngram_search.c(474): Resized score stack to 200000 entries
    INFO: ngram_search.c(466): Resized backpointer table to 20000 entries
    INFO: cmn_prior.c(121): cmn_prior_update: from < 7.66 -0.29 -0.05 0.06 -0.23
    -0.11 -0.16 -0.09 -0.14 -0.12 -0.08 -0.07 -0.12 >
    INFO: cmn_prior.c(139): cmn_prior_update: to < 7.66 -0.29 -0.05 0.06 -0.23
    -0.11 -0.16 -0.09 -0.14 -0.12 -0.08 -0.07 -0.12 >
    INFO: agc.c(172): AGCEMax: obs= 6.00, new= 6.00
    INFO: ngram_search_fwdtree.c(1549): 13366 words recognized (27/fr)
    INFO: ngram_search_fwdtree.c(1551): 2218221 senones evaluated (4555/fr)
    INFO: ngram_search_fwdtree.c(1553): 4512364 channels searched (9265/fr),
    353201 1st, 217991 last
    INFO: ngram_search_fwdtree.c(1557): 44684 words for which last channels
    evaluated (91/fr)
    INFO: ngram_search_fwdtree.c(1560): 615087 candidate words for entering last
    phone (1263/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 6.56 CPU 1.346 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 6.56 wall 1.348 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 449 words
    INFO: ngram_search_fwdflat.c(940): 8381 words recognized (17/fr)
    INFO: ngram_search_fwdflat.c(942): 418430 senones evaluated (859/fr)
    INFO: ngram_search_fwdflat.c(944): 364487 channels searched (748/fr)
    INFO: ngram_search_fwdflat.c(946): 41997 words searched (86/fr)
    INFO: ngram_search_fwdflat.c(948): 29711 word transitions (61/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 1.11 CPU 0.227 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 1.11 wall 0.227 xRT
    INFO: ngram_search.c(1206):
    not found in last frame, using .485 instead
    INFO: ngram_search.c(1258): lattice start node .0 end node .334
    INFO: ngram_search.c(1286): Eliminated 182 nodes before end node
    INFO: ngram_search.c(1391): Lattice has 1285 nodes, 13790 links
    INFO: ps_lattice.c(1365): Normalizer P(O) = alpha(:334:485) = -1559473
    INFO: ps_lattice.c(1403): Joint P(O,S) = -1595064 P(S|O) = -35591
    INFO: ngram_search.c(880): bestpath 0.98 CPU 0.202 xRT
    INFO: ngram_search.c(883): bestpath 0.98 wall 0.202 xRT
    Recognized: elle entend que dans les_autres
    INFO: ngram_search_fwdtree.c(430): TOTAL fwdtree 10.79 CPU 1.110 xRT
    INFO: ngram_search_fwdtree.c(433): TOTAL fwdtree 10.81 wall 1.112 xRT
    INFO: ngram_search_fwdflat.c(174): TOTAL fwdflat 1.56 CPU 0.160 xRT
    INFO: ngram_search_fwdflat.c(177): TOTAL fwdflat 1.56 wall 0.160 xRT
    INFO: ngram_search.c(317): TOTAL bestpath 0.99 CPU 0.102 xRT
    INFO: ngram_search.c(320): TOTAL bestpath 0.99 wall 0.102 xRT

    Thanks a lot for your help,

    Boris.**

     
  • Boris Mansencal

    Boris Mansencal - 2011-09-30

    anyone ?

     
  • Nickolay V. Shmyrev

    Most likely feature normalization went wrong. Print feature values and
    compare.

     
  • Boris Mansencal

    Boris Mansencal - 2011-10-03

    Could you please elaborate ? How do I get and print feature values ?

    I have also found that increasing buffer size (logically) improves accuracy...

     
  • Nickolay V. Shmyrev

    Yes, most likey it's AGC bug

    To print features see the following condition in sphinxbase:

    #ifdef DUMP_FEATURES
    
     
  • Boris Mansencal

    Boris Mansencal - 2011-10-26

    AGC is Automatic Gain Control, right ?
    Could you explain what the bug is ? Or is there a bug tracking system witth
    this bug somewhere ?

    Is someone working in correcting this bug ?

    Thank you,
    Boris.

     
  • Nickolay V. Shmyrev

    AGC is Automatic Gain Control, right ?

    Yes

    Could you explain what the bug is ?

    agc value is not calculated properly in online mode

    Or is there a bug tracking system witth this bug somewhere ?

    No

    Is someone working in correcting this bug ?

    No

     
  • Boris Mansencal

    Boris Mansencal - 2011-10-27

    FYI, when I launch the program I see :
    -agc none none
    -agcthresh 2.0 2.000000e+00

    -agc none max
    -agcthresh 2.0 2.000000e+00

    Could we improve our results in online mode by changing these parameters ?

    I have compiled with -DDUMP_FEATURES to have features values.
    For the "all in memory" mode, I get :
    INFO: feat.c(149): After CMN
    -2.261833 -0.383301 -0.151923 -0.218711 0.113252 0.050123 0.221852 0.013434 -0.020369 0.095906 0.072756 0.089094 0.178019
    -2.151785 -0.207930 0.077055 -0.094817 0.158462 0.066685 -0.017394 -0.014809 0.082939 0.027884 0.159237 0.030688 0.116601

    INFO: agc.c(123): AGCMax: obs=max= 6.00
    INFO: feat.c(149): After AGC
    -8.262085 -0.383301 -0.151923 -0.218711 0.113252 0.050123 0.221852 0.013434 -0.020369 0.095906 0.072756 0.089094 0.178019
    -8.152037 -0.207930 0.077055 -0.094817 0.158462 0.066685 -0.017394 -0.014809 0.082939 0.027884 0.159237 0.030688 0.116601

    INFO: feat.c(149): Incoming features (after padding)
    -8.262085 -0.383301 -0.151923 -0.218711 0.113252 0.050123 0.221852 0.013434 -0.020369 0.095906 0.072756 0.089094 0.178019
    -8.262085 -0.383301 -0.151923 -0.218711 0.113252 0.050123 0.221852 0.013434 -0.020369 0.095906 0.072756 0.089094 0.178019

    For the online version, I get :
    INFO: feat.c(149): After CMN
    -2.261833 -0.383301 -0.151923 -0.218711 0.113252 0.050123 0.221852 0.013434 -0.020369 0.095906 0.072756 0.089094 0.178019
    INFO: feat.c(149): After AGC
    -2.261833 -0.383301 -0.151923 -0.218711 0.113252 0.050123 0.221852 0.013434 -0.020369 0.095906 0.072756 0.089094 0.178019
    INFO: feat.c(149): After CMN
    -2.151785 -0.207930 0.077055 -0.094817 0.158462 0.066685 -0.017394 -0.014809 0.082939 0.027884 0.159237 0.030688 0.116601
    -2.176410 -0.229570 -0.052823 -0.200628 0.218424 0.047539 0.131155 0.020091 0.122641 -0.064158 -0.099204 -0.046863 -0.007805
    -2.401895 -0.371993 -0.016168 -0.061217 0.170805 -0.032357 0.108044 -0.130535 0.000032 0.103746 -0.063559 -0.015458 0.013658
    INFO: feat.c(149): After AGC
    -2.151785 -0.207930 0.077055 -0.094817 0.158462 0.066685 -0.017394 -0.014809 0.082939 0.027884 0.159237 0.030688 0.116601
    -7.176410 -0.229570 -0.052823 -0.200628 0.218424 0.047539 0.131155 0.020091 0.122641 -0.064158 -0.099204 -0.046863 -0.007805
    -7.401895 -0.371993 -0.016168 -0.061217 0.170805 -0.032357 0.108044 -0.130535 0.000032 0.103746 -0.063559 -0.015458 0.013658

    So yes values seem quite different after AGC between the two version. That's
    the AGC bug you are talking about ?

    Is there a way to alleviate this problem ?

    Besides, I have also read on Sphinx3 FAQ :
    http://www.speech.cs.cmu.edu/sphinxman/FAQ.html
    that mismatch in AGC settings between training and decode could explain bad
    results.
    I don't know if it still applies to pocket sphinx
    How do I know how with which AGC settings was trained available french model ?

    Thanks again.
    Boris.

     
  • Nickolay V. Shmyrev

    So yes values seem quite different after AGC between the two version. That's
    the AGC bug you are talking about ?

    Yes

    Is there a way to alleviate this problem ?

    One need to fix sphinxbase agc code to properly initialize AGC and estimate in
    in online mode. For example to set initial value to -6.

    How do I know how with which AGC settings was trained available french model
    ?

    It's pointed in README and in feat.params file inside the model archive.

     
  • Boris Mansencal

    Boris Mansencal - 2011-11-03

    I am having a closer look at sphinxbase code, and in particular AGC (in svn
    rev 11257).

    First, in src/libsphinxbase/feat/agc.c : agc_emax() function :
    Why is the index in the for loop starting from 1 ? Is it a bug ?
    It seems I get better results (on some french examples) with an index starting
    from 0

    Secondly, in src/libsphinxbase/feat/feat.c : in feat_init() function, I see
    that agc->max is initialized by a hardwired value :
    / HACK: hardwired initial estimates based on use of CMN (from Sphinx2) /
    agc_emax_set(fcb->agc_struct, (cmn != CMN_NONE) ? 5.0 : 10.0);

    If I understand the code correctly, this hardwired value will be used only for
    the first utterance (but for all buffers/calls to ps_process_raw for this
    first uterrance).

    Would it be a good idea to update agc->max each time agc_emax() is called on
    the first uterrance ?
    For example, it could be by adding the following code in agc_emax() before the
    for loop :
    if (agc->obs_utt == 0) {
    mfcc_t new_max = mfc;
    for (i = 1; i < n_frame; ++i)
    if (mfc_ > agc->max)
    new_max = mfc_;
    if (new_max > agc->max) {
    agc->max = new_max;
    E_INFO("AGCEMax: max= %.2f\n", agc->max);
    }
    }

    Boris.
    __

     
  • Boris Mansencal

    Boris Mansencal - 2011-11-08

    Any tought on previous message ?
    In particular, on the loop starting from 1 in agc_emax() ?

    Thanks,

    Boris.

     
  • Nickolay V. Shmyrev

    Hi Boris

    Yes, something like that does make sense. Maybe it could be less specific code
    but if it works for you and you tested and you have a patch please publish
    it and we will include it into source tree and go further!

     
  • Boris Mansencal

    Boris Mansencal - 2011-11-08

    I have not tested enough the dynamic update of agc->max for the first
    uterrance yet to be absolutely sure it improves things.

    However, my first remark about the loop still holds.
    I don't know how to send you a patch.
    Here is a svn diff from sphinxbase svn :

    Index: src/libsphinxbase/feat/agc.c

    --- src/libsphinxbase/feat/agc.c (revision 11258)
    +++ src/libsphinxbase/feat/agc.c (working copy)
    @@ -145,7 +145,7 @@

    if (n_frame <= 0)
    return;
    - for (i = 1; i < n_frame; ++i) {
    + for (i = 0; i < n_frame; ++i) {
    if (mfc_ > agc->obs_max) {
    agc->obs_max = mfc_;
    agc->obs_frame = 1;

    Boris.__

     
  • Nickolay V. Shmyrev

    I don't know how to send you a patch.

    That should be enough, thanks a lot. I still would appreciate if you could
    test it too. Because it's quite complex for me to setup french model, setup
    test, verify it, etc. :)

     
  • Boris Mansencal

    Boris Mansencal - 2011-11-10

    You should be able to test it with any other model trained with an AGC set to
    max.
    It seems that default English models were trained with AGC set to none, that's
    why you can not easily test it, isn't it ?
    If there is no sound reason why this loop starts from 1, it should be a bug.

    On the French examples I have tested, this change always improve recognition
    results.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.