Hi, there, I follow the tutorial and pocketSphinx works on my MIPS embedded system of which CPU is 580 MHz. My goal is wake up words detection, so I train my own acoustic model with 3 random picked words "eight, happy and dog ". For each words, amount of training data is about 1000(Same word speeched by different people that I take from GOOGLE open speech command sets). Keyword spotting also works on my embedded platform but is way too slow. Recognition process time is 2~4 times as long as recording time on an average. Following is the log:
I tried the change of argument "-maxhmmpf 3000 -maxwpf 2 -pl_window 8 -ds 2 -topn 2" but it's still not quick enough. Is there any step i missed? or my embedded platform is not powerful for keyword spotting? And is it normal that when I say nothing there still shows something like "INFO: cmn_live.c(120): Update from < ............... >
INFO: cmn_live.c(138): Update to < ........ >"?
Last edit: ahQi 2017-10-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, my platform is ReSpeaker its core is MT7688 Embedded MIPS24KEc (575/580 MHz) with 64 KB I-Cache and 32 KB D-Cache DDR2 SRAM 128MB SPI flah 32MB
* AP/STA Firmware: Linux 2.6.36 SDK, OpenWrt 3.10
Last edit: ahQi 2017-10-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For each words, amount of training data is about 1000(Same word speeched by different people that I take from GOOGLE open speech command sets). Keyword spotting also works on my embedded platform but is way too slow.
Keyword spotting training dataset must include large vocabulary data.
from my_db.ci_semi/feat.params
For spotting continuous model should be faster than semi-continuous.
In such cases a first step do run is to profile the application with gprof to see where it spends the time. Maybe you simply forgot to enable compiler optimization.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for reply. I did enable compiler optimization with "CFLAGS="-O3" " but it improved not much.
And I found that vad_threshold impacts result output time a lot, with default value 2.0 it usually viewed noise as speech and make utterance filled with many unnecessary content. I'll profile the application with gprof to see what bottleneck is in my app.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, there, I follow the tutorial and pocketSphinx works on my MIPS embedded system of which CPU is 580 MHz. My goal is wake up words detection, so I train my own acoustic model with 3 random picked words "eight, happy and dog ". For each words, amount of training data is about 1000(Same word speeched by different people that I take from GOOGLE open speech command sets). Keyword spotting also works on my embedded platform but is way too slow. Recognition process time is 2~4 times as long as recording time on an average. Following is the log:
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from my_db.ci_semi/feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-ceplen 13 13
-cmn live batch
-cmninit 40,3,-1 40,3,-1
-compallsen no no
-debug 0
-dict my_db.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd s2_4x
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm my_db.ci_semi
-input_endian little little
-jsgf
-keyphrase
-kws keywords.txt
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1 1.000000e+00
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 22
-lm
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.300000e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 25
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-10 1.000000e-10
-pl_pip 1.0 1.000000e+00
-pl_weight 3.0 3.000000e+00
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+03
-uw 1.0 1.000000e+00
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 2.0 2.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: feat.c(715): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: mdef.c(518): Reading model definition: my_db.ci_semi/mdef
INFO: bin_mdef.c(181): Allocating 44 * 8 bytes (0 KiB) for CD tree
INFO: tmat.c(149): Reading HMM transition probability matrices: my_db.ci_semi/transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: my_db.ci_semi/means
INFO: ms_gauden.c(242): 1 codebook, 4 feature, size:
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(244): 256x24
INFO: ms_gauden.c(244): 256x3
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: my_db.ci_semi/variances
INFO: ms_gauden.c(242): 1 codebook, 4 feature, size:
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(244): 256x24
INFO: ms_gauden.c(244): 256x3
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(304): 0 variance values floored
INFO: ptm_mgau.c(808): Number of codebooks doesn't match number of ciphones, doesn't look like PTM: 1 != 10
INFO: acmod.c(115): Attempting to use semi-continuous computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: my_db.ci_semi/means
INFO: ms_gauden.c(242): 1 codebook, 4 feature, size:
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(244): 256x24
INFO: ms_gauden.c(244): 256x3
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: my_db.ci_semi/variances
INFO: ms_gauden.c(242): 1 codebook, 4 feature, size:
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(244): 256x24
INFO: ms_gauden.c(244): 256x3
INFO: ms_gauden.c(244): 256x12
INFO: ms_gauden.c(304): 0 variance values floored
INFO: s2_semi_mgau.c(1099): Reading mixture weights file 'my_db.ci_semi/mixture_weights'
INFO: s2_semi_mgau.c(1192): Read 30 x 4 x 256 mixture weights
INFO: s2_semi_mgau.c(1297): Maximum top-N: 4 Top-N beams: 0 0 0 0
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 4102 * 20 bytes (80 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: my_db.dic
INFO: dict.c(213): Dictionary size 3, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(336): 3 words read
INFO: dict.c(358): Reading filler dictionary: my_db.ci_semi/noisedict
INFO: dict.c(213): Dictionary size 6, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 3 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 10^3 * 2 bytes (1 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 1240 bytes (1 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 1240 bytes (1 KiB) for single-phone word triphones
INFO: kws_search.c(406): KWS(beam: -1080, plp: -23, default threshold 0, delay 10)
INFO: continuous.c(307): Bill ./pocketsphinx_continuous COMPILED ON: Oct 12 2017, AT: 00:43:00
INFO: continuous.c(252): Ready....
INFO: continuous.c(261): Listening...
INFO: cmn_live.c(120): Update from < 40.00 3.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
INFO: cmn_live.c(138): Update to < 27.33 0.52 1.37 -4.49 -0.44 -6.17 -3.14 -5.56 -3.70 -0.39 -2.37 -1.84 -0.34 >
INFO: kws_search.c(656): kws 9.99 CPU 2.035 xRT
INFO: kws_search.c(658): kws 10.89 wall 2.218 xRT
INFO: continuous.c(275): Ready....
INFO: continuous.c(261): Listening...
INFO: cmn_live.c(120): Update from < 27.33 0.52 1.37 -4.49 -0.44 -6.17 -3.14 -5.56 -3.70 -0.39 -2.37 -1.84 -0.34 >
INFO: cmn_live.c(138): Update to < 27.60 -0.02 0.50 -4.07 0.22 -6.29 -3.48 -5.81 -3.93 -0.76 -2.65 -1.42 -0.73 >
INFO: kws_search.c(656): kws 9.87 CPU 3.669 xRT
INFO: kws_search.c(658): kws 11.03 wall 4.100 xRT
INFO: continuous.c(275): Ready....
INFO: continuous.c(261): Listening...
INFO: cmn_live.c(88): Update from < 27.60 -0.02 0.50 -4.07 0.22 -6.29 -3.48 -5.81 -3.93 -0.76 -2.65 -1.42 -0.73 >
INFO: cmn_live.c(105): Update to < 27.60 -0.00 0.72 -3.89 0.36 -6.40 -3.48 -5.74 -3.82 -0.71 -2.46 -1.39 -0.81 >
Input overrun, read calls are too rare (non-fatal)
INFO: cmn_live.c(120): Update from < 27.60 -0.00 0.72 -3.89 0.36 -6.40 -3.48 -5.74 -3.82 -0.71 -2.46 -1.39 -0.81 >
INFO: cmn_live.c(138): Update to < 27.13 0.43 0.70 -4.50 0.54 -6.79 -3.53 -5.84 -3.51 -0.53 -2.28 -1.02 -0.79 >
INFO: kws_search.c(656): kws 3.92 CPU 2.052 xRT
INFO: kws_search.c(658): kws 4.34 wall 2.273 xRT
INFO: continuous.c(275): Ready....
INFO: continuous.c(261): Listening...
INFO: cmn_live.c(88): Update from < 27.13 0.43 0.70 -4.50 0.54 -6.79 -3.53 -5.84 -3.51 -0.53 -2.28 -1.02 -0.79 >
INFO: cmn_live.c(105): Update to < 30.11 -2.70 -1.85 -3.61 -0.89 -7.77 -1.84 -5.53 -5.68 1.30 -3.14 -1.91 -0.07 >
INFO: cmn_live.c(120): Update from < 30.11 -2.70 -1.85 -3.61 -0.89 -7.77 -1.84 -5.53 -5.68 1.30 -3.14 -1.91 -0.07 >
INFO: cmn_live.c(138): Update to < 33.45 -6.92 -3.16 -2.36 -2.19 -6.86 -1.05 -6.66 -6.49 2.41 -3.62 -3.17 1.30 >
INFO: kws_search.c(656): kws 13.11 CPU 3.553 xRT
INFO: kws_search.c(658): kws 14.50 wall 3.930 xRT
eight dog eight dog happy happy
INFO: continuous.c(275): Ready....
INFO: continuous.c(261): Listening...
INFO: cmn_live.c(88): Update from < 33.45 -6.92 -3.16 -2.36 -2.19 -6.86 -1.05 -6.66 -6.49 2.41 -3.62 -3.17 1.30 >
INFO: cmn_live.c(105): Update to < 35.11 -8.28 -3.81 -2.73 -2.63 -6.96 -1.24 -5.75 -6.61 3.63 -4.51 -2.84 0.87 >
Input overrun, read calls are too rare (non-fatal)
INFO: cmn_live.c(120): Update from < 35.11 -8.28 -3.81 -2.73 -2.63 -6.96 -1.24 -5.75 -6.61 3.63 -4.51 -2.84 0.87 >
INFO: cmn_live.c(138): Update to < 39.26 -10.99 -8.18 -0.13 -4.18 -8.29 0.90 -4.27 -8.87 3.23 -4.23 -4.07 1.65 >
INFO: kws_search.c(656): kws 11.91 CPU 4.395 xRT
INFO: kws_search.c(658): kws 13.41 wall 4.947 xRT
happy happy
INFO: continuous.c(275): Ready....
INFO: continuous.c(261): Listening...
INFO: cmn_live.c(88): Update from < 39.26 -10.99 -8.18 -0.13 -4.18 -8.29 0.90 -4.27 -8.87 3.23 -4.23 -4.07 1.65 >
INFO: cmn_live.c(105): Update to < 40.92 -12.52 -8.92 -0.11 -4.81 -8.77 1.83 -3.85 -9.60 3.78 -4.80 -3.97 1.83 >
INFO: cmn_live.c(120): Update from < 40.92 -12.52 -8.92 -0.11 -4.81 -8.77 1.83 -3.85 -9.60 3.78 -4.80 -3.97 1.83 >
INFO: cmn_live.c(138): Update to < 43.49 -14.38 -9.18 -0.10 -4.91 -8.16 2.19 -4.82 -9.69 4.61 -4.57 -4.79 2.26 >
INFO: kws_search.c(656): kws 11.78 CPU 3.192 xRT
INFO: kws_search.c(658): kws 12.95 wall 3.509 xRT
eight dog eight dog happy happy
------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------
I tried the change of argument "-maxhmmpf 3000 -maxwpf 2 -pl_window 8 -ds 2 -topn 2" but it's still not quick enough. Is there any step i missed? or my embedded platform is not powerful for keyword spotting? And is it normal that when I say nothing there still shows something like "INFO: cmn_live.c(120): Update from < ............... >
INFO: cmn_live.c(138): Update to < ........ >"?
Last edit: ahQi 2017-10-13
What is the platform name and the performance properties of it. You should have mentioned that in the first place.
Hi, my platform is ReSpeaker its core is MT7688
Embedded MIPS24KEc (575/580 MHz) with 64 KB I-Cache and 32 KB D-Cache
DDR2 SRAM 128MB SPI flah 32MB
* AP/STA Firmware: Linux 2.6.36 SDK, OpenWrt 3.10
Last edit: ahQi 2017-10-13
Keyword spotting training dataset must include large vocabulary data.
For spotting continuous model should be faster than semi-continuous.
In such cases a first step do run is to profile the application with gprof to see where it spends the time. Maybe you simply forgot to enable compiler optimization.
Thanks for reply. I did enable compiler optimization with "CFLAGS="-O3" " but it improved not much.
And I found that vad_threshold impacts result output time a lot, with default value 2.0 it usually viewed noise as speech and make utterance filled with many unnecessary content. I'll profile the application with gprof to see what bottleneck is in my app.