Menu

Recognizer output nothing while the training result is 100% accurate

Help
Fang LIU
2022-06-25
2022-06-26
  • Fang LIU

    Fang LIU - 2022-06-25

    I'm using a very small sized dataset (5 words) to train an Mandarin acoustic model on windows10. The .align file shows a 100% accurate (file attached). However, when choosing a .wav file from training data set and feeding it to recognizer, output is an empty string.

    Appreciate your help.

    The issued command:
    pocketsphinx_continuous -infile .\wav\speaker_1\1_03.wav -hmm .\model_parameters\demo.ci_cont\ -lm .\etc\demo.lm -dict .\etc\demo.dic

    output:
    INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from .\model_parameters\demo.ci_cont\/feat.params
    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+000
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-001
    -ascale 20.0 2.000000e+001
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-048
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+000
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict .\etc\demo.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-008
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-064
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+000
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-029
    -fwdtree yes yes
    -hmm .\model_parameters\demo.ci_cont\
    -input_endian little little
    -jsgf
    -keyphrase
    -kws
    -kws_delay 10 10
    -kws_plp 1e-1 1.000000e-001
    -kws_threshold 1 1.000000e+000
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 22
    -lm .\etc\demo.lm
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+000
    -logfn
    -logspec no no
    -lowerf 133.33334 1.300000e+002
    -lpbeam 1e-40 1.000000e-040
    -lponlybeam 7e-29 7.000000e-029
    -lw 6.5 6.500000e+000
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-007
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 25
    -nwpen 1.0 1.000000e+000
    -pbeam 1e-48 1.000000e-048
    -pip 1.0 1.000000e+000
    -pl_beam 1e-10 1.000000e-010
    -pl_pbeam 1e-10 1.000000e-010
    -pl_pip 1.0 1.000000e+000
    -pl_weight 3.0 3.000000e+000
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+004
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-003
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-004
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 6.800000e+003
    -uw 1.0 1.000000e+000
    -vad_postspeech 50 50
    -vad_prespeech 20 20
    -vad_startspeech 10 10
    -vad_threshold 2.0 2.000000e+000
    -var
    -varfloor 0.0001 1.000000e-004
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-029
    -wip 0.65 6.500000e-001
    -wlen 0.025625 2.562500e-002

    INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: mdef.c(518): Reading model definition: .\model_parameters\demo.ci_cont\/mdef
    INFO: bin_mdef.c(181): Allocating 68 * 8 bytes (0 KiB) for CD tree
    INFO: tmat.c(206): Reading HMM transition probability matrices: .\model_parameters\demo.ci_cont\/transition_matrices
    INFO: acmod.c(117): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: .\model_parameters\demo.ci_cont\/means
    INFO: ms_gauden.c(292): 48 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 1x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: .\model_parameters\demo.ci_cont\/variances
    INFO: ms_gauden.c(292): 48 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 1x39
    INFO: ms_gauden.c(354): 117 variance values floored
    INFO: ptm_mgau.c(805): Number of codebooks doesn't match number of ciphones, doesn't look like PTM: 48 != 16
    INFO: acmod.c(119): Attempting to use semi-continuous computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: .\model_parameters\demo.ci_cont\/means
    INFO: ms_gauden.c(292): 48 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 1x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: .\model_parameters\demo.ci_cont\/variances
    INFO: ms_gauden.c(292): 48 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 1x39
    INFO: ms_gauden.c(354): 117 variance values floored
    INFO: acmod.c(121): Falling back to general multi-stream GMM computation
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: .\model_parameters\demo.ci_cont\/means
    INFO: ms_gauden.c(292): 48 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 1x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: .\model_parameters\demo.ci_cont\/variances
    INFO: ms_gauden.c(292): 48 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 1x39
    INFO: ms_gauden.c(354): 117 variance values floored
    INFO: ms_senone.c(149): Reading senone mixture weights: .\model_parameters\demo.ci_cont\/mixture_weights
    INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
    INFO: ms_senone.c(207): Not transposing mixture weights in memory
    INFO: ms_senone.c(268): Read mixture weights for 48 senones: 1 features x 1 codewords
    INFO: ms_senone.c(320): Mapping senones to individual codebooks
    INFO: ms_mgau.c(141): The value of topn: 4
    WARN: "ms_mgau.c", line 145: -topn argument (4) invalid or > #density codewords (1); set to latter
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 4104 * 20 bytes (80 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: .\etc\demo.dic
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(336): 5 words read
    INFO: dict.c(358): Reading filler dictionary: .\model_parameters\demo.ci_cont\/noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 3 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 16^3 * 2 bytes (8 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 3136 bytes (3 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 3136 bytes (3 KiB) for single-phone word triphones
    INFO: ngram_model_trie.c(347): Trying to read LM in trie binary format
    INFO: ngram_model_trie.c(358): Header doesn't match
    INFO: ngram_model_trie.c(176): Trying to read LM in arpa format
    INFO: ngram_model_trie.c(192): LM of order 3
    INFO: ngram_model_trie.c(194): #1-grams: 8
    INFO: ngram_model_trie.c(194): #2-grams: 10
    INFO: ngram_model_trie.c(194): #3-grams: 13
    INFO: lm_trie.c(473): Training quantizer
    INFO: lm_trie.c(481): Building LM trie
    INFO: ngram_search_fwdtree.c(99): 5 unique initial diphones
    INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 4 single-phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 4 single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 138
    INFO: ngram_search_fwdtree.c(339): after: 5 root, 10 non-root channels, 3 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Jan 24 2016, AT: 07:35:37

    INFO: cmn_prior.c(131): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 12.78 10.78 -7.19 -7.00 -1.36 1.78 5.53 5.37 1.63 2.67 -2.00 -3.08 -0.49 >
    INFO: ngram_search_fwdtree.c(1553): 255 words recognized (2/fr)
    INFO: ngram_search_fwdtree.c(1555): 852 senones evaluated (6/fr)
    INFO: ngram_search_fwdtree.c(1559): 435 channels searched (3/fr), 153 1st, 278 last
    INFO: ngram_search_fwdtree.c(1562): 278 words for which last channels evaluated (2/fr)
    INFO: ngram_search_fwdtree.c(1564): 0 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.00 CPU 0.000 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 0.01 wall 0.011 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 2 words
    INFO: ngram_search_fwdflat.c(948): 360 words recognized (3/fr)
    INFO: ngram_search_fwdflat.c(950): 393 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(952): 381 channels searched (2/fr)
    INFO: ngram_search_fwdflat.c(954): 381 words searched (2/fr)
    INFO: ngram_search_fwdflat.c(957): 76 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.00 CPU 0.000 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.00 wall 0.000 xRT
    INFO: ngram_search.c(1253): lattice start node .0 end node .24
    INFO: ngram_search.c(1279): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1384): Lattice has 7 nodes, 4 links
    INFO: ps_lattice.c(1380): Bestpath score: -1235
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:24:130) = -96950
    INFO: ps_lattice.c(1441): Joint P(O,S) = -96950 P(S|O) = 0
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.01 wall 0.005 xRT

    INFO: cmn_prior.c(131): cmn_prior_update: from < 12.78 10.78 -7.19 -7.00 -1.36 1.78 5.53 5.37 1.63 2.67 -2.00 -3.08 -0.49 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 29.57 8.61 3.01 10.81 0.21 0.81 5.55 6.60 4.90 7.37 -3.54 3.63 -0.64 >
    INFO: ngram_search_fwdtree.c(1553): 154 words recognized (1/fr)
    INFO: ngram_search_fwdtree.c(1555): 423 senones evaluated (3/fr)
    INFO: ngram_search_fwdtree.c(1559): 328 channels searched (2/fr), 0 1st, 328 last
    INFO: ngram_search_fwdtree.c(1562): 328 words for which last channels evaluated (2/fr)
    INFO: ngram_search_fwdtree.c(1564): 0 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.00 CPU 0.000 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 0.01 wall 0.006 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 2 words
    INFO: ngram_search_fwdflat.c(948): 157 words recognized (1/fr)
    INFO: ngram_search_fwdflat.c(950): 423 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(952): 373 channels searched (2/fr)
    INFO: ngram_search_fwdflat.c(954): 373 words searched (2/fr)
    INFO: ngram_search_fwdflat.c(957): 76 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.00 CPU 0.000 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.00 wall 0.001 xRT
    INFO: ngram_search.c(1253): lattice start node .0 end node .60
    INFO: ngram_search.c(1279): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1384): Lattice has 6 nodes, 5 links
    INFO: ps_lattice.c(1380): Bestpath score: -649
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:60:140) = -58852
    INFO: ps_lattice.c(1441): Joint P(O,S) = -61367 P(S|O) = -2515
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.01 wall 0.004 xRT

    INFO: cmn_prior.c(131): cmn_prior_update: from < 29.57 8.61 3.01 10.81 0.21 0.81 5.55 6.60 4.90 7.37 -3.54 3.63 -0.64 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 29.57 8.61 3.01 10.81 0.21 0.81 5.55 6.60 4.90 7.37 -3.54 3.63 -0.64 >
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 0 words
    INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 0.02 CPU 0.006 xRT
    INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 0.04 wall 0.014 xRT
    INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.00 CPU 0.000 xRT
    INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.00 wall 0.000 xRT
    INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(306): TOTAL bestpath 0.01 wall 0.004 xRT

     
  • Paul

    Paul - 2022-06-25

    Hi Fang,
    I'm not an expert and haven't used this for a while, but, two possible things did spring to mind on seeing your question: 1) I'd try setting backtrace to yes [-backtrace yes] as I think (could be wrong) with it set to "no" as is currently shown in the configuration settings, then it won't output anything on the command line. You could then at least see if you're getting any output at the command line. 2) I remember having a lot of issues with the cases between by dictionary and the LM file if they're not the same case you won't get any output. Hopoefully, one of those is useful.

     
    • Fang LIU

      Fang LIU - 2022-06-25

      Thank you Paul. Unfortunately, both are not working.
      1) [-backtrace yes], I issued the command: pocketsphinx_continuous -infile .\wav\goforward.raw -hmm .\model_parameters\en-us\ -lm .\etc\en-us.lm.bin -dict .\etc\cmudict-en-us.dict and got the output: go forward ten meters. In its configuration backtrace is set to no as well. It is not the cause. But it reminds me to compare their configures, will do it later.
      2) I also opened the lm file and dic file, seems normal. Besides, the two files are also used for training and testing using sphinxtrain. The training success shows that it is less likely has such inconsistence issue.

       
  • Fang LIU

    Fang LIU - 2022-06-26

    Realized that it is an issue of too short audio, and test in training uses pocketsphinx_batch which handles short audio better than pocketsphinx_continous.
    See related post here.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.