I am trying to do keyword spotting on 16kHz data. I have a list of keywords. I am trying to find the best kws_threshold value for each keyword.
I have an evaluation set of 1000 files. Currently I am running following command on all 1000 files.
I repeat this process from 1e-100 to 1. I am assuming, the number which gives least absolute difference between reference keyword count and spotted count using above command, is my best threshold value.
My questions are
Is this the way to find best threshold i.e., optimizing for each keyword individually. Is there any other way?
Kindly, provide the pointers to how keyword spotting decoder of pocketsphinx works i.e., which methods are used. How the scoring works etc.
I trained the normal speech recognizer as per tutrial using transcriptions and MFCCs. I would like to know if separate training procedure exists for keyword spotters.
Thanks
Bhargav
Last edit: bhargav 2016-09-16
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I trained the normal speech recognizer as per tutrial using transcriptions and MFCCs. I would like to know if separate training procedure exists for keyword spotters.
No you have to follow standard training
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a four wave files wherein people spoke "I am going from delhi to goa"
when i normally decode above files using trigram LM, the recognition output is exactly matching the reference.
But when I do keyword spotting for the phrase "delhi to goa" for the file, I am not getting keyword at all for any threshold value i.e., true positive is jus 1 out of 6.
Can this happen..
Am I missing something..
plz throw some light on this issue
Thanks in advance
Bhargav
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I checked your files, I believe the acoustic model was not quite properly trained. It feels that the model is quite overtrained, most likely the database was too small. What was the size of the database and how was it collected, what was the number of speakers in the database? You need to follow recommendations from tutorial for the amount of training data.
Looking on your audio I see that frequencies above 5khz are not really reliable, first of all I recommend you to change upper frequency to 5000 instead of 6800 and retrain the model, that should make your detection much more reliable.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I am trying to do keyword spotting on 16kHz data. I have a list of keywords. I am trying to find the best kws_threshold value for each keyword.
I have an evaluation set of 1000 files. Currently I am running following command on all 1000 files.
pocketsphinx_continuous -hmm Model -keyphrase <my_keyword> -infile eval_file_1.wav -dict my_dict.dic -kws_threshold <some_value></some_value></my_keyword>
I repeat this process from 1e-100 to 1. I am assuming, the number which gives least absolute difference between reference keyword count and spotted count using above command, is my best threshold value.
My questions are
Thanks
Bhargav
Last edit: bhargav 2016-09-16
Yes
No
http://eprints.qut.edu.au/37254/1/Albert_Thambiratnam_Thesis.pdf
No you have to follow standard training
any suggestions on my questions..
I have a four wave files wherein people spoke "I am going from delhi to goa"
when i normally decode above files using trigram LM, the recognition output is exactly matching the reference.
But when I do keyword spotting for the phrase "delhi to goa" for the file, I am not getting keyword at all for any threshold value i.e., true positive is jus 1 out of 6.
Can this happen..
Am I missing something..
plz throw some light on this issue
Thanks in advance
Bhargav
You need to share the data to reproduce your problems and provide pocketsphinx logs to get help on this issue.
**pocketsphinx_continuous -hmm /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600 -lm ../etc/100_800_mob1.lm -infile test_s2s/wav/1.wav -dict /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic **
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600
-input_endian little little
-jsgf
-keyphrase
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1 1.000000e+00
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 0
-lm ../etc/100_800_mob1.lm
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333334e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-10 1.000000e-10
-pl_pip 1.0 1.000000e+00
-pl_weight 3.0 3.000000e+00
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-uw 1.0 1.000000e+00
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 2.0 2.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: mdef.c(518): Reading model definition: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/mdef
INFO: bin_mdef.c(181): Allocating 34795 * 8 bytes (271 KiB) for CD tree
INFO: tmat.c(206): Reading HMM transition probability matrices: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/transition_matrices
INFO: acmod.c(117): Attempting to use PTM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(354): 1292 variance values floored
INFO: ptm_mgau.c(801): Number of codebooks exceeds 256: 2759
INFO: acmod.c(119): Attempting to use semi-continuous computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(354): 1292 variance values floored
INFO: acmod.c(121): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(354): 1292 variance values floored
INFO: ms_senone.c(149): Reading senone mixture weights: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/mixture_weights
INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(207): Not transposing mixture weights in memory
INFO: ms_senone.c(268): Read mixture weights for 2759 senones: 1 features x 24 codewords
INFO: ms_senone.c(320): Mapping senones to individual codebooks
INFO: ms_mgau.c(141): The value of topn: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 5993 * 20 bytes (117 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic
INFO: dict.c(213): Allocated 13 KiB for strings, 21 KiB for phones
INFO: dict.c(336): 1893 words read
INFO: dict.c(358): Reading filler dictionary: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 53^3 * 2 bytes (290 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 33920 bytes (33 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 33920 bytes (33 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(347): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(358): Header doesn't match
INFO: ngram_model_trie.c(176): Trying to read LM in arpa format
INFO: ngram_model_trie.c(192): LM of order 3
INFO: ngram_model_trie.c(194): #1-grams: 1265
INFO: ngram_model_trie.c(194): #2-grams: 5240
INFO: ngram_model_trie.c(194): #3-grams: 6998
INFO: lm_trie.c(473): Training quantizer
INFO: lm_trie.c(481): Building LM trie
INFO: ngram_search_fwdtree.c(99): 327 unique initial diphones
INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 9 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 9 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 4827
INFO: ngram_search_fwdtree.c(339): after: 324 root, 4699 non-root channels, 8 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Sep 14 2016, AT: 10:13:20
INFO: cmn_prior.c(131): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 12.60 0.43 -0.22 0.21 -0.24 -0.20 -0.20 -0.07 -0.20 -0.10 -0.08 -0.18 -0.07 >
INFO: ngram_search_fwdtree.c(1553): 4081 words recognized (7/fr)
INFO: ngram_search_fwdtree.c(1555): 491857 senones evaluated (847/fr)
INFO: ngram_search_fwdtree.c(1559): 509158 channels searched (876/fr), 107245 1st, 97564 last
INFO: ngram_search_fwdtree.c(1562): 8183 words for which last channels evaluated (14/fr)
INFO: ngram_search_fwdtree.c(1564): 23440 candidate words for entering last phone (40/fr)
INFO: ngram_search_fwdtree.c(1567): fwdtree 0.92 CPU 0.159 xRT
INFO: ngram_search_fwdtree.c(1570): fwdtree 0.92 wall 0.159 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 187 words
INFO: ngram_search_fwdflat.c(948): 1565 words recognized (3/fr)
INFO: ngram_search_fwdflat.c(950): 157264 senones evaluated (271/fr)
INFO: ngram_search_fwdflat.c(952): 161468 channels searched (277/fr)
INFO: ngram_search_fwdflat.c(954): 16353 words searched (28/fr)
INFO: ngram_search_fwdflat.c(957): 14634 word transitions (25/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.23 CPU 0.039 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.23 wall 0.039 xRT
INFO: ngram_search.c(1253): lattice start node
.0 end node.559INFO: ngram_search.c(1279): Eliminated 0 nodes before end node
INFO: ngram_search.c(1384): Lattice has 261 nodes, 115 links
INFO: ps_lattice.c(1380): Bestpath score: -25739
INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:559:579) = -1752903
INFO: ps_lattice.c(1441): Joint P(O,S) = -1783590 P(S|O) = -30687
INFO: ngram_search.c(875): bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(878): bestpath 0.00 wall 0.000 xRT
i need information after availability nights from delhi to goa on december 22nd evening
INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 0.92 CPU 0.159 xRT
INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 0.92 wall 0.159 xRT
INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.23 CPU 0.039 xRT
INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.23 wall 0.039 xRT
INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.000 xRT
INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall 0.000 xRT
aai@aai-shrut1:~/workspace/100_800_mob1/key_word_spot$ pocketsphinx_continuous -hmm /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600 -keyphrase "delhi to goa" -infile test_s2s/wav/1.wav -dict /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic -kws_threshold 1e-30
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600
-input_endian little little
-jsgf
-keyphrase delhi to goa
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1 1.000000e-30
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 0
-lm
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333334e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-10 1.000000e-10
-pl_pip 1.0 1.000000e+00
-pl_weight 3.0 3.000000e+00
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-uw 1.0 1.000000e+00
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 2.0 2.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: mdef.c(518): Reading model definition: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/mdef
INFO: bin_mdef.c(181): Allocating 34795 * 8 bytes (271 KiB) for CD tree
INFO: tmat.c(206): Reading HMM transition probability matrices: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/transition_matrices
INFO: acmod.c(117): Attempting to use PTM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(354): 1292 variance values floored
INFO: ptm_mgau.c(801): Number of codebooks exceeds 256: 2759
INFO: acmod.c(119): Attempting to use semi-continuous computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(354): 1292 variance values floored
INFO: acmod.c(121): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(354): 1292 variance values floored
INFO: ms_senone.c(149): Reading senone mixture weights: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/mixture_weights
INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(207): Not transposing mixture weights in memory
INFO: ms_senone.c(268): Read mixture weights for 2759 senones: 1 features x 24 codewords
INFO: ms_senone.c(320): Mapping senones to individual codebooks
INFO: ms_mgau.c(141): The value of topn: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 5993 * 20 bytes (117 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic
INFO: dict.c(213): Allocated 13 KiB for strings, 21 KiB for phones
INFO: dict.c(336): 1893 words read
INFO: dict.c(358): Reading filler dictionary: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 53^3 * 2 bytes (290 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 33920 bytes (33 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 33920 bytes (33 KiB) for single-phone word triphones
INFO: kws_search.c(420): KWS(beam: -1080, plp: -23, default threshold -675, delay 10)
INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Sep 14 2016, AT: 10:13:20
INFO: cmn_prior.c(131): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 12.60 0.43 -0.22 0.21 -0.24 -0.20 -0.20 -0.07 -0.20 -0.10 -0.08 -0.18 -0.07 >
INFO: kws_search.c(658): kws 0.14 CPU 0.025 xRT
INFO: kws_search.c(660): kws 0.14 wall 0.025 xRT
INFO: kws_search.c(467): TOTAL kws 0.14 CPU 0.025 xRT
INFO: kws_search.c(470): TOTAL kws 0.14 wall 0.025 xRT
aai@aai-shrut1:~/workspace/100_800_mob1/key_word_spot$ pocketsphinx_continuous -hmm /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600 -keyphrase "delhi to goa" -infile test_s2s/wav/1.wav -dict /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic -kws_threshold 1e-20
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600
-input_endian little little
-jsgf
-keyphrase delhi to goa
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1 1.000000e-20
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 0
-lm
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333334e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-10 1.000000e-10
-pl_pip 1.0 1.000000e+00
-pl_weight 3.0 3.000000e+00
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-uw 1.0 1.000000e+00
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 2.0 2.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: mdef.c(518): Reading model definition: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/mdef
INFO: bin_mdef.c(181): Allocating 34795 * 8 bytes (271 KiB) for CD tree
INFO: tmat.c(206): Reading HMM transition probability matrices: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/transition_matrices
INFO: acmod.c(117): Attempting to use PTM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(354): 1292 variance values floored
INFO: ptm_mgau.c(801): Number of codebooks exceeds 256: 2759
INFO: acmod.c(119): Attempting to use semi-continuous computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(354): 1292 variance values floored
INFO: acmod.c(121): Falling back to general multi-stream GMM computation
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/means
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/variances
INFO: ms_gauden.c(292): 2759 codebook, 1 feature, size:
INFO: ms_gauden.c(294): 24x39
INFO: ms_gauden.c(354): 1292 variance values floored
INFO: ms_senone.c(149): Reading senone mixture weights: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/mixture_weights
INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
INFO: ms_senone.c(207): Not transposing mixture weights in memory
INFO: ms_senone.c(268): Read mixture weights for 2759 senones: 1 features x 24 codewords
INFO: ms_senone.c(320): Mapping senones to individual codebooks
INFO: ms_mgau.c(141): The value of topn: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 5993 * 20 bytes (117 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /home/aai/workspace/100_800_mob1/etc/100_800_mob1.dic
INFO: dict.c(213): Allocated 13 KiB for strings, 21 KiB for phones
INFO: dict.c(336): 1893 words read
INFO: dict.c(358): Reading filler dictionary: /home/aai/workspace/100_800_mob/model_parameters/100_800_mob.cd_cont_2600/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 53^3 * 2 bytes (290 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 33920 bytes (33 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 33920 bytes (33 KiB) for single-phone word triphones
INFO: kws_search.c(420): KWS(beam: -1080, plp: -23, default threshold -450, delay 10)
INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Sep 14 2016, AT: 10:13:20
INFO: cmn_prior.c(131): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_prior.c(149): cmn_prior_update: to < 12.60 0.43 -0.22 0.21 -0.24 -0.20 -0.20 -0.07 -0.20 -0.10 -0.08 -0.18 -0.07 >
INFO: kws_search.c(658): kws 0.14 CPU 0.024 xRT
INFO: kws_search.c(660): kws 0.14 wall 0.024 xRT
INFO: kws_search.c(467): TOTAL kws 0.14 CPU 0.024 xRT
INFO: kws_search.c(470): TOTAL kws 0.14 wall 0.024 xRT
I checked your files, I believe the acoustic model was not quite properly trained. It feels that the model is quite overtrained, most likely the database was too small. What was the size of the database and how was it collected, what was the number of speakers in the database? You need to follow recommendations from tutorial for the amount of training data.
Looking on your audio I see that frequencies above 5khz are not really reliable, first of all I recommend you to change upper frequency to 5000 instead of 6800 and retrain the model, that should make your detection much more reliable.