CMU Sphinx / Forums / Help: Bad performance of pocketsphinx code

Pablo Caravaca - 2019-08-14

Hello! I woud like to post a topic about a problem I am having with an application of pocketsphinx.

Here are the details:

I want to use an input buffer from a microphone to feed the decoder of pocketsphinx. Here is the code i am into:

const char * recognize_from_microphone(cb_t *q){
int16_t buffer[4096];
ps_start_utt(ps);
utt_started = FALSE;

clock_t start, end; double diff_t; while(1) { int k = cb_get(q, buffer, 4096); start = clock(); ps_process_raw(ps, (const int16_t *)buffer, k, FALSE, FALSE); end = clock(); in_speech = ps_get_in_speech(ps); diff_t = ((double) (end - start)) / CLOCKS_PER_SEC; printf("Execution time = %f\n", diff_t); printf("%d\n",in_speech); if (in_speech && !utt_started) { utt_started = TRUE; printf("Listening...\n"); } if (!in_speech && utt_started) { ps_end_utt(ps); hyp = ps_get_hyp(ps, NULL ); if (hyp != NULL){ return hyp; break; } } sleep_msec(20); }

So, the 2 printf functions are saying to me that ps_process_raw is getting around 1 sec for executing each time and in_speech always is returned as ‘1’. I’ve been fighting with the code but still cannot find a solution.

The microphone is working ok, and it is waiting for the cb_get function to be executed to have free space in buffer (producer/consumer solution).

Help is much appreciated. Cheers.

alternate
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2019-08-14
  
  It is hard to tell, probably your cpu is very slow. Did you try to decode from file first with pocketsphinx_continuous? How long does it take.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Pablo Caravaca - 2019-08-14
    
    Thanks, Nickolay, can you tell me something about this out for continuos.c --inmic yes?
    
    INFO: continous.c(254): Ready....
    Execution time = 0.000011
    Execution time = 0.000855
    Execution time = 0.000814
    Execution time = 0.001174
    Execution time = 0.000897
    Execution time = 0.001875
    Execution time = 0.002055
    Execution time = 1.281628
    INFO: continous.c(273): Listening...
    Input overrun, read calls are too rare (non-fatal)
    Execution time = 0.000006
    Execution time = 0.000004
    Execution time = 1.812801
    Input overrun, read calls are too rare (non-fatal)
    Execution time = 0.000004
    Execution time = 0.000003
    Execution time = 1.276526
    Input overrun, read calls are too rare (non-fatal)
    Execution time = 0.000005
    Execution time = 0.000005
    Execution time = 2.089774
    Input overrun, read calls are too rare (non-fatal)
    Execution time = 0.000004
    Execution time = 0.000003
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2019-08-14
      
      I asked you to decode a file and provide the full output as well as cpu info.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Pablo Caravaca - 2019-08-14
        
        Yeah, your right, sorry. I am kindla desperate right now. haha. I'm using Raspberry pi 3 b+, the out for file decoding is:
        
        INFO: ngram_search.c(467): Resized score stack to 200000 entries
        INFO: ngram_search.c(459): Resized backpointer table to 10000 entries
        INFO: ngram_search_fwdtree.c(949): cand_sf[] increased to 64 entries
        INFO: ngram_search.c(467): Resized score stack to 400000 entries
        INFO: ngram_search.c(459): Resized backpointer table to 20000 entries
        INFO: ngram_search.c(467): Resized score stack to 800000 entries
        INFO: ngram_search.c(459): Resized backpointer table to 40000 entries
        INFO: ngram_search.c(467): Resized score stack to 1600000 entries
        INFO: cmn_live.c(120): Update from < 41.00 -5.29 -0.12 5.09 2.48 -4.07 -1.37 -1.78 -5.08 -2.05 -6.45 -1.42 1.17 >
        INFO: cmn_live.c(138): Update to < 37.44 -3.28 3.53 6.57 -8.86 11.49 -12.17 8.89 3.51 -8.78 -1.72 -3.90 -0.12 >
        INFO: ngram_search_fwdtree.c(1550): 34051 words recognized (68/fr)
        INFO: ngram_search_fwdtree.c(1552): 1868504 senones evaluated (3737/fr)
        INFO: ngram_search_fwdtree.c(1556): 11798273 channels searched (23596/fr), 317733 1st, 854816 last
        INFO: ngram_search_fwdtree.c(1559): 48426 words for which last channels evaluated (96/fr)
        INFO: ngram_search_fwdtree.c(1561): 1032622 candidate words for entering last phone (2065/fr)
        INFO: ngram_search_fwdtree.c(1564): fwdtree 30.71 CPU 6.141 xRT
        INFO: ngram_search_fwdtree.c(1567): fwdtree 30.83 wall 6.166 xRT
        INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 845 words
        INFO: ngram_search_fwdflat.c(948): 21537 words recognized (43/fr)
        INFO: ngram_search_fwdflat.c(950): 947620 senones evaluated (1895/fr)
        INFO: ngram_search_fwdflat.c(952): 2242514 channels searched (4485/fr)
        INFO: ngram_search_fwdflat.c(954): 117970 words searched (235/fr)
        INFO: ngram_search_fwdflat.c(957): 77223 word transitions (154/fr)
        INFO: ngram_search_fwdflat.c(960): fwdflat 5.48 CPU 1.095 xRT
        INFO: ngram_search_fwdflat.c(963): fwdflat 5.48 wall 1.096 xRT
        INFO: ngram_search.c(1197): not found in last frame, using this.498 instead
        INFO: ngram_search.c(1250): lattice start node 0 end node this.454
        INFO: ngram_search.c(1276): Eliminated 182 nodes before end node
        INFO: ngram_search.c(1381): Lattice has 2529 nodes, 41578 links
        INFO: ps_lattice.c(1380): Bestpath score: -21097
        INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(this:454:498) = -1186561
        INFO: ps_lattice.c(1441): Joint P(O,S) = -1363451 P(S|O) = -176890
        INFO: ngram_search.c(872): bestpath 1.63 CPU 0.326 xRT
        INFO: ngram_search.c(875): bestpath 1.63 wall 0.326 xRT
        one yeah yeah and and and yes this
        INFO: ngram_search_fwdtree.c(429): TOTAL fwdtree 30.71 CPU 6.153 xRT
        INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 30.83 wall 6.178 xRT
        INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 5.48 CPU 1.098 xRT
        INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 5.48 wall 1.098 xRT
        INFO: ngram_search.c(303): TOTAL bestpath 1.63 CPU 0.326 xRT
        INFO: ngram_search.c(306): TOTAL bestpath 1.63 wall 0.326 xRT
        
        Obviously I didnt say "one yeah yeah and and and yes this" hahaha. It's 5 seconds file with tv in the back
        
        Last edit: Pablo Caravaca 2019-08-14
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Nickolay V. Shmyrev - 2019-08-15
        
        The output is incomplete, the model is probably too large. The file format could be also wrong.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Pablo Caravaca - 2019-08-15
        
        This is the rest of the output. It's the standard model as I didn't make any modification.
        The format of the file is 16bits, 16Khz little endian
        
        INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/en-us/en-us/feat.params
        Current configuration:
        [NAME] [DEFLT] [VALUE]
        -agc none none
        -agcthresh 2.0 2.000000e+00
        -allphone
        -allphone_ci yes yes
        -alpha 0.97 9.700000e-01
        -ascale 20.0 2.000000e+01
        -aw 1 1
        -backtrace no no
        -beam 1e-48 1.000000e-48
        -bestpath yes yes
        -bestpathlw 9.5 9.500000e+00
        -ceplen 13 13
        -cmn live batch
        -cmninit 40,3,-1 41.00,-5.29,-0.12,5.09,2.48,-4.07,-1.37,-1.78,-5.08,-2.05,-6.45,-1.42,1.17
        -compallsen no no
        -dict /usr/share/pocketsphinx/model/en-us/cmudict-en-us.dict
        -dictcase no no
        -dither no no
        -doublebw no no
        -ds 1 1
        -fdict
        -feat 1s_c_d_dd 1s_c_d_dd
        -featparams
        -fillprob 1e-8 1.000000e-08
        -frate 100 100
        -fsg
        -fsgusealtpron yes yes
        -fsgusefiller yes yes
        -fwdflat yes yes
        -fwdflatbeam 1e-64 1.000000e-64
        -fwdflatefwid 4 4
        -fwdflatlw 8.5 8.500000e+00
        -fwdflatsfwin 25 25
        -fwdflatwbeam 7e-29 7.000000e-29
        -fwdtree yes yes
        -hmm /usr/share/pocketsphinx/model/en-us/en-us
        -input_endian little little
        -jsgf
        -keyphrase
        -kws
        -kws_delay 10 10
        -kws_plp 1e-1 1.000000e-01
        -kws_threshold 1e-30 1.000000e-30
        -latsize 5000 5000
        -lda
        -ldadim 0 0
        -lifter 0 22
        -lm /usr/share/pocketsphinx/model/en-us/en-us.lm.bin
        -lmctl
        -lmname
        -logbase 1.0001 1.000100e+00
        -logfn
        -logspec no no
        -lowerf 133.33334 1.300000e+02
        -lpbeam 1e-40 1.000000e-40
        -lponlybeam 7e-29 7.000000e-29
        -lw 6.5 6.500000e+00
        -maxhmmpf 30000 30000
        -maxwpf -1 -1
        -mdef
        -mean
        -mfclogdir
        -min_endfr 0 0
        -mixw
        -mixwfloor 0.0000001 1.000000e-07
        -mllr
        -mmap yes yes
        -ncep 13 13
        -nfft 512 512
        -nfilt 40 25
        -nwpen 1.0 1.000000e+00
        -pbeam 1e-48 1.000000e-48
        -pip 1.0 1.000000e+00
        -pl_beam 1e-10 1.000000e-10
        -pl_pbeam 1e-10 1.000000e-10
        -pl_pip 1.0 1.000000e+00
        -pl_weight 3.0 3.000000e+00
        -pl_window 5 5
        -rawlogdir
        -remove_dc no no
        -remove_noise yes yes
        -remove_silence yes yes
        -round_filters yes yes
        -samprate 16000 1.600000e+04
        -seed -1 -1
        -sendump
        -senlogdir
        -senmgau
        -silprob 0.005 5.000000e-03
        -smoothspec no no
        -svspec 0-12/13-25/26-38
        -tmat
        -tmatfloor 0.0001 1.000000e-04
        -topn 4 4
        -topn_beam 0 0
        -toprule
        -transform legacy dct
        -unit_area yes yes
        -upperf 6855.4976 6.800000e+03
        -uw 1.0 1.000000e+00
        -vad_postspeech 50 50
        -vad_prespeech 20 20
        -vad_startspeech 10 10
        -vad_threshold 3.0 3.000000e+00
        -var
        -varfloor 0.0001 1.000000e-04
        -varnorm no no
        -verbose no no
        -warp_params
        -warp_type inverse_linear inverse_linear
        -wbeam 7e-29 7.000000e-29
        -wip 0.65 6.500000e-01
        -wlen 0.025625 2.562500e-02
        
        INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
        INFO: acmod.c(162): Using subvector specification 0-12/13-25/26-38
        INFO: mdef.c(518): Reading model definition: /usr/share/pocketsphinx/model/en-us/en-us/mdef
        INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
        INFO: bin_mdef.c(336): Reading binary model definition: /usr/share/pocketsphinx/model/en-us/en-us/mdef
        INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
        INFO: tmat.c(149): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/en-us/en-us/transition_matrices
        INFO: acmod.c(113): Attempting to use PTM computation module
        INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/en-us/en-us/means
        INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
        INFO: ms_gauden.c(244): 128x13
        INFO: ms_gauden.c(244): 128x13
        INFO: ms_gauden.c(244): 128x13
        INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/en-us/en-us/variances
        INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
        INFO: ms_gauden.c(244): 128x13
        INFO: ms_gauden.c(244): 128x13
        INFO: ms_gauden.c(244): 128x13
        INFO: ms_gauden.c(304): 222 variance values floored
        INFO: ptm_mgau.c(475): Loading senones from dump file /usr/share/pocketsphinx/model/en-us/en-us/sendump
        INFO: ptm_mgau.c(499): BEGIN FILE FORMAT DESCRIPTION
        INFO: ptm_mgau.c(562): Rows: 128, Columns: 5126
        INFO: ptm_mgau.c(594): Using memory-mapped I/O for senones
        INFO: ptm_mgau.c(837): Maximum top-N: 4
        INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
        INFO: dict.c(320): Allocating 138824 * 20 bytes (2711 KiB) for word entries
        INFO: dict.c(333): Reading main dictionary: /usr/share/pocketsphinx/model/en-us/cmudict-en-us.dict
        INFO: dict.c(213): Dictionary size 134723, allocated 1016 KiB for strings, 1679 KiB for phones
        INFO: dict.c(336): 134723 words read
        INFO: dict.c(358): Reading filler dictionary: /usr/share/pocketsphinx/model/en-us/en-us/noisedict
        INFO: dict.c(213): Dictionary size 134728, allocated 0 KiB for strings, 0 KiB for phones
        INFO: dict.c(361): 5 words read
        INFO: dict2pid.c(396): Building PID tables for dictionary
        INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
        INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
        INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word triphones
        INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
        INFO: ngram_search_fwdtree.c(74): Initializing search tree
        INFO: ngram_search_fwdtree.c(101): 791 unique initial diphones
        INFO: ngram_search_fwdtree.c(186): Creating search channels
        INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 152609
        INFO: ngram_search_fwdtree.c(333): Created 723 root, 152481 non-root channels, 53 single-phone words
        INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
        INFO: continous.c(319): ./cross COMPILED ON: Aug 15 2019, AT: 00:38:05
        
        Last edit: Pablo Caravaca 2019-08-15
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Pablo Caravaca - 2019-08-15
        
        It actually decode fast enough, as you can see in the output from recording from the microphone, isn't it? I am a bit lost here. Appreciate your answers
        
        By reducing the size of buffer (to 1024), I got it working continuously, and by using dict and lm customized with a few words, it seems to be working.... Making progress.... But... Still in_speech is always set as 1....
        
        This is the log file for my last attempt:
        INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/en-us/en-us/feat.params
        Current configuration:
        [NAME] [DEFLT] [VALUE]
        -agc none none
        -agcthresh 2.0 2.000000e+00
        -allphone
        -allphone_ci yes yes
        -alpha 0.97 9.700000e-01
        -ascale 20.0 2.000000e+01
        -aw 1 1
        -backtrace no no
        -beam 1e-48 1.000000e-48
        -bestpath yes yes
        -bestpathlw 9.5 9.500000e+00
        -ceplen 13 13
        -cmn live batch
        -cmninit 40,3,-1 41.00,-5.29,-0.12,5.09,2.48,-4.07,-1.37,-1.78,-5.08,-2.05,-6.45,-1.42,1.17
        -compallsen no no
        -dict /home/pi/modelo_mv/6706.dic
        -dictcase no no
        -dither no no
        -doublebw no no
        -ds 1 1
        -fdict
        -feat 1s_c_d_dd 1s_c_d_dd
        -featparams
        -fillprob 1e-8 1.000000e-08
        -frate 100 100
        -fsg
        -fsgusealtpron yes yes
        -fsgusefiller yes yes
        -fwdflat yes yes
        -fwdflatbeam 1e-64 1.000000e-64
        -fwdflatefwid 4 4
        -fwdflatlw 8.5 8.500000e+00
        -fwdflatsfwin 25 25
        -fwdflatwbeam 7e-29 7.000000e-29
        -fwdtree yes yes
        -hmm /usr/share/pocketsphinx/model/en-us/en-us
        -input_endian little little
        -jsgf
        -keyphrase
        -kws
        -kws_delay 10 10
        -kws_plp 1e-1 1.000000e-01
        -kws_threshold 1e-30 1.000000e-30
        -latsize 5000 5000
        -lda
        -ldadim 0 0
        -lifter 0 22
        -lm /home/pi/modelo_mv/6706.lm
        -lmctl
        -lmname
        -logbase 1.0001 1.000100e+00
        -logfn /home/pi/modelo_mv/log
        -logspec no no
        -lowerf 133.33334 1.300000e+02
        -lpbeam 1e-40 1.000000e-40
        -lponlybeam 7e-29 7.000000e-29
        -lw 6.5 6.500000e+00
        -maxhmmpf 30000 30000
        -maxwpf -1 -1
        -mdef
        -mean
        -mfclogdir
        -min_endfr 0 0
        -mixw
        -mixwfloor 0.0000001 1.000000e-07
        -mllr
        -mmap yes yes
        -ncep 13 13
        -nfft 512 512
        -nfilt 40 25
        -nwpen 1.0 1.000000e+00
        -pbeam 1e-48 1.000000e-48
        -pip 1.0 1.000000e+00
        -pl_beam 1e-10 1.000000e-10
        -pl_pbeam 1e-10 1.000000e-10
        -pl_pip 1.0 1.000000e+00
        -pl_weight 3.0 3.000000e+00
        -pl_window 5 5
        -rawlogdir
        -remove_dc no no
        -remove_noise yes yes
        -remove_silence yes yes
        -round_filters yes yes
        -samprate 16000 1.600000e+04
        -seed -1 -1
        -sendump
        -senlogdir
        -senmgau
        -silprob 0.005 5.000000e-03
        -smoothspec no no
        -svspec 0-12/13-25/26-38
        -tmat
        -tmatfloor 0.0001 1.000000e-04
        -topn 4 4
        -topn_beam 0 0
        -toprule
        -transform legacy dct
        -unit_area yes yes
        -upperf 6855.4976 6.800000e+03
        -uw 1.0 1.000000e+00
        -vad_postspeech 50 50
        -vad_prespeech 20 20
        -vad_startspeech 10 10
        -vad_threshold 3.0 3.000000e+00
        -var
        -varfloor 0.0001 1.000000e-04
        -varnorm no no
        -verbose no no
        -warp_params
        -warp_type inverse_linear inverse_linear
        -wbeam 7e-29 7.000000e-29
        -wip 0.65 6.500000e-01
        -wlen 0.025625 2.562500e-02
        
        INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
        INFO: acmod.c(162): Using subvector specification 0-12/13-25/26-38
        INFO: mdef.c(518): Reading model definition: /usr/share/pocketsphinx/model/en-us/en-us/mdef
        INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
        INFO: bin_mdef.c(336): Reading binary model definition: /usr/share/pocketsphinx/model/en-us/en-us/mdef
        INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
        INFO: tmat.c(149): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/en-us/en-us/transition_matrices
        INFO: acmod.c(113): Attempting to use PTM computation module
        INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/en-us/en-us/means
        INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
        INFO: ms_gauden.c(244): 128x13
        INFO: ms_gauden.c(244): 128x13
        INFO: ms_gauden.c(244): 128x13
        INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/en-us/en-us/variances
        INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
        INFO: ms_gauden.c(244): 128x13
        INFO: ms_gauden.c(244): 128x13
        INFO: ms_gauden.c(244): 128x13
        INFO: ms_gauden.c(304): 222 variance values floored
        INFO: ptm_mgau.c(475): Loading senones from dump file /usr/share/pocketsphinx/model/en-us/en-us/sendump
        INFO: ptm_mgau.c(499): BEGIN FILE FORMAT DESCRIPTION
        INFO: ptm_mgau.c(562): Rows: 128, Columns: 5126
        INFO: ptm_mgau.c(594): Using memory-mapped I/O for senones
        INFO: ptm_mgau.c(837): Maximum top-N: 4
        INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
        INFO: dict.c(320): Allocating 4123 * 20 bytes (80 KiB) for word entries
        INFO: dict.c(333): Reading main dictionary: /home/pi/modelo_mv/6706.dic
        INFO: dict.c(213): Dictionary size 22, allocated 0 KiB for strings, 0 KiB for phones
        INFO: dict.c(336): 22 words read
        INFO: dict.c(358): Reading filler dictionary: /usr/share/pocketsphinx/model/en-us/en-us/noisedict
        INFO: dict.c(213): Dictionary size 27, allocated 0 KiB for strings, 0 KiB for phones
        INFO: dict.c(361): 5 words read
        INFO: dict2pid.c(396): Building PID tables for dictionary
        INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
        INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
        INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word triphones
        INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
        INFO: ngram_model_trie.c(365): Header doesn't match
        INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
        INFO: ngram_model_trie.c(193): LM of order 3
        INFO: ngram_model_trie.c(195): #1-grams: 21
        INFO: ngram_model_trie.c(195): #2-grams: 39
        INFO: ngram_model_trie.c(195): #3-grams: 39
        INFO: lm_trie.c(474): Training quantizer
        INFO: lm_trie.c(482): Building LM trie
        INFO: ngram_search_fwdtree.c(74): Initializing search tree
        INFO: ngram_search_fwdtree.c(101): 20 unique initial diphones
        INFO: ngram_search_fwdtree.c(186): Creating search channels
        INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 175
        INFO: ngram_search_fwdtree.c(333): Created 20 root, 47 non-root channels, 5 single-phone words
        INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
        INFO: cmn_live.c(88): Update from < 41.00 -5.29 -0.12 5.09 2.48 -4.07 -1.37 -1.78 -5.08 -2.05 -6.45 -1.42 1.17 >
        INFO: cmn_live.c(105): Update to < 67.99 -11.31 -6.89 -6.54 -0.91 -0.01 -1.91 -6.54 -0.47 -5.15 -4.71 -2.68 0.13 >
        INFO: cmn_live.c(88): Update from < 67.99 -11.31 -6.89 -6.54 -0.91 -0.01 -1.91 -6.54 -0.47 -5.15 -4.71 -2.68 0.13 >
        INFO: cmn_live.c(105): Update to < 68.23 -11.15 -6.47 -6.57 -0.67 -0.20 -1.93 -6.66 -0.56 -4.97 -4.78 -2.83 -0.09 >
        
        Last edit: Pablo Caravaca 2019-08-15
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Welll, I got the continue.c working with a smaller model, and a smaller buffer.
The buffer is receiving ok and functions are executing on time and the audio input is surely getting sampled at 16 Khz, 16bit, little endian. However, I am not recognizing any word (in_speech always 0), what is strange, as the code is practically the same.

I'm doubting about config struct, but seems ok to me....

Here is the code:

#include <stdio.h>
#include <string.h>
#include <pocketsphinx/pocketsphinx.h>
#include <sphinxbase/ad.h>
#include <sphinxbase/err.h>
#include "CBuffer/cbuffer.h"
#include <time.h>

extern "C" cb_t q;

static const arg_t cont_args_def[] = {
    POCKETSPHINX_OPTIONS,
    /* Argument file. */
    {"-argfile", ARG_STRING, NULL, "Argument file giving extra arguments."},
    {"-adcdev", ARG_STRING, NULL, "Name of audio device to use for input."},
    {"-infile", ARG_STRING, NULL, "Audio file to transcribe."},
    {"-inmic", ARG_BOOLEAN, "no", "Transcribe audio from microphone."},
    {"-time", ARG_BOOLEAN, "no", "Print word times in file transcription."},
    CMDLN_EMPTY_OPTION};

const char * recognize_from_microphone();

ps_decoder_t *ps;       //  Decoder structure
cmd_ln_t *config;       //  Configuration for decoder

uint8 utt_started;      //  Flags for utterance started and speech is producing
int32 k;                           //   Number of frames in audio buffer
char const *hyp;                   //   Hypothesis for given speech
char const *decoded_speech;

void *t_sphinx (void *arg) {

    config = cmd_ln_init(NULL,
            cont_args_def, TRUE,
            "-lm", "/home/pi/modelo_mv/6706.lm",
            "-dict", "/home/pi/modelo_mv/6706.dic",
            "-kws_threshold", "1e-20",
            "-keyphrase", "MATRIX",
            NULL);


    ps_default_search_args(config);
    ps = ps_init(config);                                                        // initialize the pocketsphinx decoder

    while(1){
        decoded_speech = recognize_from_microphone();                 // call the function to capture and decode speech
        printf("You Said: %s\n", decoded_speech);                               // send decoded speech to screen
    }

    ps_free(ps);
}

static void sleep_msec(int32 ms) {
    struct timeval tmo;
    tmo.tv_sec = 0;
    tmo.tv_usec = ms * 1000;

    select(0, NULL, NULL, NULL, &tmo);
}

const char * recognize_from_microphone(){
    int16_t buffer[2046];
    if (ps_start_utt(ps) < 0) E_FATAL("Failed to start utterance\n");
    utt_started = FALSE;
    printf("Ready...\n");

    clock_t start, end;
    double diff_t;
    int k,j;

    while(1) {
        k = cb_get(&q, buffer, 2046);   // capture the number of frames in the audio buffer

        start = clock();
        ps_process_raw(ps, (const int16_t *)buffer, k, FALSE, FALSE);  // send the audio buffer to the pocketsphinx decoder
        const int8_t in_speech = ps_get_in_speech(ps);            // test to see if speech is being detected
        end = clock();

        if (j ==40){
            diff_t = ((double) (end - start)) / CLOCKS_PER_SEC;
            printf("%d\t",in_speech);
            printf("%f\n",diff_t);
            j=0;
        }
        else
            j++;

        if (in_speech && !utt_started) {             // if speech has started and utt_started flag is false
            utt_started = TRUE;                      // then set the flag
            printf("Listening...\n");
        }

        if (!in_speech && utt_started) {             // if speech has ended and the utt_started flag is true
            ps_end_utt(ps);                          // then mark the end of the utterance
            hyp = ps_get_hyp(ps, NULL );             // query pocketsphinx for "hypothesis" of decoded statement
            return hyp;                              // the function returns the hypothesis
            break;                               // exit the while loop and return to main

        }
        sleep_msec(10);
    }
}

Last edit: Pablo Caravaca 2019-08-15

Bad performance of pocketsphinx code

Speech Recognition Toolkit

Forums

Help

Bad performance of pocketsphinx code document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Bad performance of pocketsphinx code