Menu

setting kws_threshold value

Help
bhargav
2016-09-16
2016-10-04
  • bhargav

    bhargav - 2016-09-16

    Hello,

    I am trying to do keyword spotting on 16kHz data. I have a list of keywords. I am trying to find the best kws_threshold value for each keyword.
    I have an evaluation set of 1000 files. Currently I am running following command on all 1000 files.

    pocketsphinx_continuous -hmm Model -keyphrase <my_keyword> -infile eval_file_1.wav -dict my_dict.dic -kws_threshold <some_value></some_value></my_keyword>

    I repeat this process from 1e-100 to 1. I am assuming, the number which gives least absolute difference between reference keyword count and spotted count using above command, is my best threshold value.

    My questions are

    1. Is this the way to find best threshold i.e., optimizing for each keyword individually. Is there any other way?
    2. Kindly, provide the pointers to how keyword spotting decoder of pocketsphinx works i.e., which methods are used. How the scoring works etc.
    3. I trained the normal speech recognizer as per tutrial using transcriptions and MFCCs. I would like to know if separate training procedure exists for keyword spotters.

    Thanks
    Bhargav

     

    Last edit: bhargav 2016-09-16
    • Nickolay V. Shmyrev

      Is this the way to find best threshold i.e., optimizing for each keyword individually.

      Yes

      Is there any other way?

      No

      Kindly, provide the pointers to how keyword spotting decoder of pocketsphinx works i.e., which methods are used. How the scoring works etc.

      http://eprints.qut.edu.au/37254/1/Albert_Thambiratnam_Thesis.pdf

      I trained the normal speech recognizer as per tutrial using transcriptions and MFCCs. I would like to know if separate training procedure exists for keyword spotters.

      No you have to follow standard training

       
  • bhargav

    bhargav - 2016-09-21

    any suggestions on my questions..

     
  • bhargav

    bhargav - 2016-09-27

    I have a four wave files wherein people spoke "I am going from delhi to goa"
    when i normally decode above files using trigram LM, the recognition output is exactly matching the reference.
    But when I do keyword spotting for the phrase "delhi to goa" for the file, I am not getting keyword at all for any threshold value i.e., true positive is jus 1 out of 6.

    Can this happen..
    Am I missing something..
    plz throw some light on this issue

    Thanks in advance
    Bhargav

     
    • Nickolay V. Shmyrev

      You need to share the data to reproduce your problems and provide pocketsphinx logs to get help on this issue.

       
  • bhargav

    bhargav - 2016-09-28

    **pocketsphinx_continuous -hmm /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600 -lm ../etc/100_800_mob1.lm -infile test_s2s/wav/1.wav -dict /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic **
    INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/feat.params
    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-01
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600
    -input_endian little little
    -jsgf
    -keyphrase
    -kws
    -kws_delay 10 10
    -kws_plp 1e-1 1.000000e-01
    -kws_threshold 1 1.000000e+00
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 0
    -lm ../etc/100_800_mob1.lm
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333334e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-10 1.000000e-10
    -pl_pip 1.0 1.000000e+00
    -pl_weight 3.0 3.000000e+00
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -uw 1.0 1.000000e+00
    -vad_postspeech 50 50
    -vad_prespeech 20 20
    -vad_startspeech 10 10
    -vad_threshold 2.0 2.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: mdef.c(518): Reading model definition: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/mdef
    INFO: bin_mdef.c(181): Allocating 34795 * 8 bytes (271 KiB) for CD tree
    INFO: tmat.c(206): Reading HMM transition probability matrices: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/transition_matrices
    INFO: acmod.c(117): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(354): 1292 variance values floored
    INFO: ptm_mgau.c(801): Number of codebooks exceeds 256: 2759
    INFO: acmod.c(119): Attempting to use semi-continuous computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(354): 1292 variance values floored
    INFO: acmod.c(121): Falling back to general multi-stream GMM computation
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(354): 1292 variance values floored
    INFO: ms_senone.c(149): Reading senone mixture weights: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/mixture_weights
    INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
    INFO: ms_senone.c(207): Not transposing mixture weights in memory
    INFO: ms_senone.c(268): Read mixture weights for 2759 senones: 1 features x 24 codewords
    INFO: ms_senone.c(320): Mapping senones to individual codebooks
    INFO: ms_mgau.c(141): The value of topn: 4
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 5993 * 20 bytes (117 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic
    INFO: dict.c(213): Allocated 13 KiB for strings, 21 KiB for phones
    INFO: dict.c(336): 1893 words read
    INFO: dict.c(358): Reading filler dictionary: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 3 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 53^3 * 2 bytes (290 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 33920 bytes (33 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 33920 bytes (33 KiB) for single-phone word triphones
    INFO: ngram_model_trie.c(347): Trying to read LM in trie binary format
    INFO: ngram_model_trie.c(358): Header doesn't match
    INFO: ngram_model_trie.c(176): Trying to read LM in arpa format
    INFO: ngram_model_trie.c(192): LM of order 3
    INFO: ngram_model_trie.c(194): #1-grams: 1265
    INFO: ngram_model_trie.c(194): #2-grams: 5240
    INFO: ngram_model_trie.c(194): #3-grams: 6998
    INFO: lm_trie.c(473): Training quantizer
    INFO: lm_trie.c(481): Building LM trie
    INFO: ngram_search_fwdtree.c(99): 327 unique initial diphones
    INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 9 single-phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 9 single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 4827
    INFO: ngram_search_fwdtree.c(339): after: 324 root, 4699 non-root channels, 8 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Sep 14 2016, AT: 10:13:20

    INFO: cmn_prior.c(131): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 12.60 0.43 -0.22 0.21 -0.24 -0.20 -0.20 -0.07 -0.20 -0.10 -0.08 -0.18 -0.07 >
    INFO: ngram_search_fwdtree.c(1553): 4081 words recognized (7/fr)
    INFO: ngram_search_fwdtree.c(1555): 491857 senones evaluated (847/fr)
    INFO: ngram_search_fwdtree.c(1559): 509158 channels searched (876/fr), 107245 1st, 97564 last
    INFO: ngram_search_fwdtree.c(1562): 8183 words for which last channels evaluated (14/fr)
    INFO: ngram_search_fwdtree.c(1564): 23440 candidate words for entering last phone (40/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.92 CPU 0.159 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 0.92 wall 0.159 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 187 words
    INFO: ngram_search_fwdflat.c(948): 1565 words recognized (3/fr)
    INFO: ngram_search_fwdflat.c(950): 157264 senones evaluated (271/fr)
    INFO: ngram_search_fwdflat.c(952): 161468 channels searched (277/fr)
    INFO: ngram_search_fwdflat.c(954): 16353 words searched (28/fr)
    INFO: ngram_search_fwdflat.c(957): 14634 word transitions (25/fr)
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.23 CPU 0.039 xRT
    INFO: ngram_search_fwdflat.c(963): fwdflat 0.23 wall 0.039 xRT
    INFO: ngram_search.c(1253): lattice start node .0 end node .559
    INFO: ngram_search.c(1279): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1384): Lattice has 261 nodes, 115 links
    INFO: ps_lattice.c(1380): Bestpath score: -25739
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:559:579) = -1752903
    INFO: ps_lattice.c(1441): Joint P(O,S) = -1783590 P(S|O) = -30687
    INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(878): bestpath 0.00 wall 0.000 xRT
    i need information after availability nights from delhi to goa on december 22nd evening
    INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 0.92 CPU 0.159 xRT
    INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 0.92 wall 0.159 xRT
    INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.23 CPU 0.039 xRT
    INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.23 wall 0.039 xRT
    INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall 0.000 xRT
    aai@aai-shrut1:~/workspace/100_800_mob1/key_word_spot$ pocketsphinx_continuous -hmm /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600 -keyphrase "delhi to goa" -infile test_s2s/wav/1.wav -dict /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic -kws_threshold 1e-30
    INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/feat.params
    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-01
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600
    -input_endian little little
    -jsgf
    -keyphrase delhi to goa
    -kws
    -kws_delay 10 10
    -kws_plp 1e-1 1.000000e-01
    -kws_threshold 1 1.000000e-30
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 0
    -lm
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333334e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-10 1.000000e-10
    -pl_pip 1.0 1.000000e+00
    -pl_weight 3.0 3.000000e+00
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -uw 1.0 1.000000e+00
    -vad_postspeech 50 50
    -vad_prespeech 20 20
    -vad_startspeech 10 10
    -vad_threshold 2.0 2.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: mdef.c(518): Reading model definition: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/mdef
    INFO: bin_mdef.c(181): Allocating 34795 * 8 bytes (271 KiB) for CD tree
    INFO: tmat.c(206): Reading HMM transition probability matrices: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/transition_matrices
    INFO: acmod.c(117): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(354): 1292 variance values floored
    INFO: ptm_mgau.c(801): Number of codebooks exceeds 256: 2759
    INFO: acmod.c(119): Attempting to use semi-continuous computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(354): 1292 variance values floored
    INFO: acmod.c(121): Falling back to general multi-stream GMM computation
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(354): 1292 variance values floored
    INFO: ms_senone.c(149): Reading senone mixture weights: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/mixture_weights
    INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
    INFO: ms_senone.c(207): Not transposing mixture weights in memory
    INFO: ms_senone.c(268): Read mixture weights for 2759 senones: 1 features x 24 codewords
    INFO: ms_senone.c(320): Mapping senones to individual codebooks
    INFO: ms_mgau.c(141): The value of topn: 4
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 5993 * 20 bytes (117 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic
    INFO: dict.c(213): Allocated 13 KiB for strings, 21 KiB for phones
    INFO: dict.c(336): 1893 words read
    INFO: dict.c(358): Reading filler dictionary: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 3 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 53^3 * 2 bytes (290 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 33920 bytes (33 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 33920 bytes (33 KiB) for single-phone word triphones
    INFO: kws_search.c(420): KWS(beam: -1080, plp: -23, default threshold -675, delay 10)
    INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Sep 14 2016, AT: 10:13:20

    INFO: cmn_prior.c(131): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 12.60 0.43 -0.22 0.21 -0.24 -0.20 -0.20 -0.07 -0.20 -0.10 -0.08 -0.18 -0.07 >
    INFO: kws_search.c(658): kws 0.14 CPU 0.025 xRT
    INFO: kws_search.c(660): kws 0.14 wall 0.025 xRT
    INFO: kws_search.c(467): TOTAL kws 0.14 CPU 0.025 xRT
    INFO: kws_search.c(470): TOTAL kws 0.14 wall 0.025 xRT
    aai@aai-shrut1:~/workspace/100_800_mob1/key_word_spot$ pocketsphinx_continuous -hmm /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600 -keyphrase "delhi to goa" -infile test_s2s/wav/1.wav -dict /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic -kws_threshold 1e-20
    INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/feat.params
    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-01
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600
    -input_endian little little
    -jsgf
    -keyphrase delhi to goa
    -kws
    -kws_delay 10 10
    -kws_plp 1e-1 1.000000e-01
    -kws_threshold 1 1.000000e-20
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 0
    -lm
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333334e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-10 1.000000e-10
    -pl_pip 1.0 1.000000e+00
    -pl_weight 3.0 3.000000e+00
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -uw 1.0 1.000000e+00
    -vad_postspeech 50 50
    -vad_prespeech 20 20
    -vad_startspeech 10 10
    -vad_threshold 2.0 2.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: mdef.c(518): Reading model definition: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/mdef
    INFO: bin_mdef.c(181): Allocating 34795 * 8 bytes (271 KiB) for CD tree
    INFO: tmat.c(206): Reading HMM transition probability matrices: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/transition_matrices
    INFO: acmod.c(117): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(354): 1292 variance values floored
    INFO: ptm_mgau.c(801): Number of codebooks exceeds 256: 2759
    INFO: acmod.c(119): Attempting to use semi-continuous computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(354): 1292 variance values floored
    INFO: acmod.c(121): Falling back to general multi-stream GMM computation
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
    INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 24x39
    INFO: ms_gauden.c(354): 1292 variance values floored
    INFO: ms_senone.c(149): Reading senone mixture weights: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/mixture_weights
    INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
    INFO: ms_senone.c(207): Not transposing mixture weights in memory
    INFO: ms_senone.c(268): Read mixture weights for 2759 senones: 1 features x 24 codewords
    INFO: ms_senone.c(320): Mapping senones to individual codebooks
    INFO: ms_mgau.c(141): The value of topn: 4
    INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 5993 * 20 bytes (117 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic
    INFO: dict.c(213): Allocated 13 KiB for strings, 21 KiB for phones
    INFO: dict.c(336): 1893 words read
    INFO: dict.c(358): Reading filler dictionary: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(361): 3 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 53^3 * 2 bytes (290 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 33920 bytes (33 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 33920 bytes (33 KiB) for single-phone word triphones
    INFO: kws_search.c(420): KWS(beam: -1080, plp: -23, default threshold -450, delay 10)
    INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Sep 14 2016, AT: 10:13:20

    INFO: cmn_prior.c(131): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 12.60 0.43 -0.22 0.21 -0.24 -0.20 -0.20 -0.07 -0.20 -0.10 -0.08 -0.18 -0.07 >
    INFO: kws_search.c(658): kws 0.14 CPU 0.024 xRT
    INFO: kws_search.c(660): kws 0.14 wall 0.024 xRT
    INFO: kws_search.c(467): TOTAL kws 0.14 CPU 0.024 xRT
    INFO: kws_search.c(470): TOTAL kws 0.14 wall 0.024 xRT

     
    • Nickolay V. Shmyrev

      I checked your files, I believe the acoustic model was not quite properly trained. It feels that the model is quite overtrained, most likely the database was too small. What was the size of the database and how was it collected, what was the number of speakers in the database? You need to follow recommendations from tutorial for the amount of training data.

      Looking on your audio I see that frequencies above 5khz are not really reliable, first of all I recommend you to change upper frequency to 5000 instead of 6800 and retrain the model, that should make your detection much more reliable.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.