Menu

Bad performance of pocketsphinx code

Help
2019-08-14
2019-08-15
  • Pablo Caravaca

    Pablo Caravaca - 2019-08-14

    Hello! I woud like to post a topic about a problem I am having with an application of pocketsphinx.

    Here are the details:

    I want to use an input buffer from a microphone to feed the decoder of pocketsphinx. Here is the code i am into:

    const char * recognize_from_microphone(cb_t *q){
    int16_t buffer[4096];
    ps_start_utt(ps);
    utt_started = FALSE;

    clock_t start, end;
    double diff_t;
    
    while(1) {
        int k = cb_get(q, buffer, 4096);
    
        start = clock();
        ps_process_raw(ps, (const int16_t *)buffer, k, FALSE, FALSE);
        end = clock();
    
        in_speech = ps_get_in_speech(ps);
    
        diff_t = ((double) (end - start)) / CLOCKS_PER_SEC;
    
        printf("Execution time = %f\n", diff_t);
        printf("%d\n",in_speech);
    
        if (in_speech && !utt_started) {
            utt_started = TRUE;   
            printf("Listening...\n");
        }
    
        if (!in_speech && utt_started) {
            ps_end_utt(ps); 
            hyp = ps_get_hyp(ps, NULL ); 
            if (hyp != NULL){
                return hyp; 
                break;
            }
        }
        sleep_msec(20);
    }
    

    So, the 2 printf functions are saying to me that ps_process_raw is getting around 1 sec for executing each time and in_speech always is returned as ‘1’. I’ve been fighting with the code but still cannot find a solution.

    The microphone is working ok, and it is waiting for the cb_get function to be executed to have free space in buffer (producer/consumer solution).

    Help is much appreciated. Cheers.

     
    • Nickolay V. Shmyrev

      It is hard to tell, probably your cpu is very slow. Did you try to decode from file first with pocketsphinx_continuous? How long does it take.

       
      • Pablo Caravaca

        Pablo Caravaca - 2019-08-14

        Thanks, Nickolay, can you tell me something about this out for continuos.c --inmic yes?

        INFO: continous.c(254): Ready....
        Execution time = 0.000011
        Execution time = 0.000855
        Execution time = 0.000814
        Execution time = 0.001174
        Execution time = 0.000897
        Execution time = 0.001875
        Execution time = 0.002055
        Execution time = 1.281628
        INFO: continous.c(273): Listening...
        Input overrun, read calls are too rare (non-fatal)
        Execution time = 0.000006
        Execution time = 0.000004
        Execution time = 1.812801
        Input overrun, read calls are too rare (non-fatal)
        Execution time = 0.000004
        Execution time = 0.000003
        Execution time = 1.276526
        Input overrun, read calls are too rare (non-fatal)
        Execution time = 0.000005
        Execution time = 0.000005
        Execution time = 2.089774
        Input overrun, read calls are too rare (non-fatal)
        Execution time = 0.000004
        Execution time = 0.000003

         
        • Nickolay V. Shmyrev

          I asked you to decode a file and provide the full output as well as cpu info.

           
          • Pablo Caravaca

            Pablo Caravaca - 2019-08-14

            Yeah, your right, sorry. I am kindla desperate right now. haha. I'm using Raspberry pi 3 b+, the out for file decoding is:

            INFO: ngram_search.c(467): Resized score stack to 200000 entries
            INFO: ngram_search.c(459): Resized backpointer table to 10000 entries
            INFO: ngram_search_fwdtree.c(949): cand_sf[] increased to 64 entries
            INFO: ngram_search.c(467): Resized score stack to 400000 entries
            INFO: ngram_search.c(459): Resized backpointer table to 20000 entries
            INFO: ngram_search.c(467): Resized score stack to 800000 entries
            INFO: ngram_search.c(459): Resized backpointer table to 40000 entries
            INFO: ngram_search.c(467): Resized score stack to 1600000 entries
            INFO: cmn_live.c(120): Update from < 41.00 -5.29 -0.12 5.09 2.48 -4.07 -1.37 -1.78 -5.08 -2.05 -6.45 -1.42 1.17 >
            INFO: cmn_live.c(138): Update to < 37.44 -3.28 3.53 6.57 -8.86 11.49 -12.17 8.89 3.51 -8.78 -1.72 -3.90 -0.12 >
            INFO: ngram_search_fwdtree.c(1550): 34051 words recognized (68/fr)
            INFO: ngram_search_fwdtree.c(1552): 1868504 senones evaluated (3737/fr)
            INFO: ngram_search_fwdtree.c(1556): 11798273 channels searched (23596/fr), 317733 1st, 854816 last
            INFO: ngram_search_fwdtree.c(1559): 48426 words for which last channels evaluated (96/fr)
            INFO: ngram_search_fwdtree.c(1561): 1032622 candidate words for entering last phone (2065/fr)
            INFO: ngram_search_fwdtree.c(1564): fwdtree 30.71 CPU 6.141 xRT
            INFO: ngram_search_fwdtree.c(1567): fwdtree 30.83 wall 6.166 xRT
            INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 845 words
            INFO: ngram_search_fwdflat.c(948): 21537 words recognized (43/fr)
            INFO: ngram_search_fwdflat.c(950): 947620 senones evaluated (1895/fr)
            INFO: ngram_search_fwdflat.c(952): 2242514 channels searched (4485/fr)
            INFO: ngram_search_fwdflat.c(954): 117970 words searched (235/fr)
            INFO: ngram_search_fwdflat.c(957): 77223 word transitions (154/fr)
            INFO: ngram_search_fwdflat.c(960): fwdflat 5.48 CPU 1.095 xRT
            INFO: ngram_search_fwdflat.c(963): fwdflat 5.48 wall 1.096 xRT
            INFO: ngram_search.c(1197): not found in last frame, using this.498 instead
            INFO: ngram_search.c(1250): lattice start node 0 end node this.454
            INFO: ngram_search.c(1276): Eliminated 182 nodes before end node
            INFO: ngram_search.c(1381): Lattice has 2529 nodes, 41578 links
            INFO: ps_lattice.c(1380): Bestpath score: -21097
            INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(this:454:498) = -1186561
            INFO: ps_lattice.c(1441): Joint P(O,S) = -1363451 P(S|O) = -176890
            INFO: ngram_search.c(872): bestpath 1.63 CPU 0.326 xRT
            INFO: ngram_search.c(875): bestpath 1.63 wall 0.326 xRT
            one yeah yeah and and and yes this
            INFO: ngram_search_fwdtree.c(429): TOTAL fwdtree 30.71 CPU 6.153 xRT
            INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 30.83 wall 6.178 xRT
            INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 5.48 CPU 1.098 xRT
            INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 5.48 wall 1.098 xRT
            INFO: ngram_search.c(303): TOTAL bestpath 1.63 CPU 0.326 xRT
            INFO: ngram_search.c(306): TOTAL bestpath 1.63 wall 0.326 xRT

            Obviously I didnt say "one yeah yeah and and and yes this" hahaha. It's 5 seconds file with tv in the back

             

            Last edit: Pablo Caravaca 2019-08-14
            • Nickolay V. Shmyrev

              The output is incomplete, the model is probably too large. The file format could be also wrong.

               
              • Pablo Caravaca

                Pablo Caravaca - 2019-08-15

                This is the rest of the output. It's the standard model as I didn't make any modification.
                The format of the file is 16bits, 16Khz little endian

                INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/en-us/en-us/feat.params
                Current configuration:
                [NAME] [DEFLT] [VALUE]
                -agc none none
                -agcthresh 2.0 2.000000e+00
                -allphone
                -allphone_ci yes yes
                -alpha 0.97 9.700000e-01
                -ascale 20.0 2.000000e+01
                -aw 1 1
                -backtrace no no
                -beam 1e-48 1.000000e-48
                -bestpath yes yes
                -bestpathlw 9.5 9.500000e+00
                -ceplen 13 13
                -cmn live batch
                -cmninit 40,3,-1 41.00,-5.29,-0.12,5.09,2.48,-4.07,-1.37,-1.78,-5.08,-2.05,-6.45,-1.42,1.17
                -compallsen no no
                -dict /usr/share/pocketsphinx/model/en-us/cmudict-en-us.dict
                -dictcase no no
                -dither no no
                -doublebw no no
                -ds 1 1
                -fdict
                -feat 1s_c_d_dd 1s_c_d_dd
                -featparams
                -fillprob 1e-8 1.000000e-08
                -frate 100 100
                -fsg
                -fsgusealtpron yes yes
                -fsgusefiller yes yes
                -fwdflat yes yes
                -fwdflatbeam 1e-64 1.000000e-64
                -fwdflatefwid 4 4
                -fwdflatlw 8.5 8.500000e+00
                -fwdflatsfwin 25 25
                -fwdflatwbeam 7e-29 7.000000e-29
                -fwdtree yes yes
                -hmm /usr/share/pocketsphinx/model/en-us/en-us
                -input_endian little little
                -jsgf
                -keyphrase
                -kws
                -kws_delay 10 10
                -kws_plp 1e-1 1.000000e-01
                -kws_threshold 1e-30 1.000000e-30
                -latsize 5000 5000
                -lda
                -ldadim 0 0
                -lifter 0 22
                -lm /usr/share/pocketsphinx/model/en-us/en-us.lm.bin
                -lmctl
                -lmname
                -logbase 1.0001 1.000100e+00
                -logfn
                -logspec no no
                -lowerf 133.33334 1.300000e+02
                -lpbeam 1e-40 1.000000e-40
                -lponlybeam 7e-29 7.000000e-29
                -lw 6.5 6.500000e+00
                -maxhmmpf 30000 30000
                -maxwpf -1 -1
                -mdef
                -mean
                -mfclogdir
                -min_endfr 0 0
                -mixw
                -mixwfloor 0.0000001 1.000000e-07
                -mllr
                -mmap yes yes
                -ncep 13 13
                -nfft 512 512
                -nfilt 40 25
                -nwpen 1.0 1.000000e+00
                -pbeam 1e-48 1.000000e-48
                -pip 1.0 1.000000e+00
                -pl_beam 1e-10 1.000000e-10
                -pl_pbeam 1e-10 1.000000e-10
                -pl_pip 1.0 1.000000e+00
                -pl_weight 3.0 3.000000e+00
                -pl_window 5 5
                -rawlogdir
                -remove_dc no no
                -remove_noise yes yes
                -remove_silence yes yes
                -round_filters yes yes
                -samprate 16000 1.600000e+04
                -seed -1 -1
                -sendump
                -senlogdir
                -senmgau
                -silprob 0.005 5.000000e-03
                -smoothspec no no
                -svspec 0-12/13-25/26-38
                -tmat
                -tmatfloor 0.0001 1.000000e-04
                -topn 4 4
                -topn_beam 0 0
                -toprule
                -transform legacy dct
                -unit_area yes yes
                -upperf 6855.4976 6.800000e+03
                -uw 1.0 1.000000e+00
                -vad_postspeech 50 50
                -vad_prespeech 20 20
                -vad_startspeech 10 10
                -vad_threshold 3.0 3.000000e+00
                -var
                -varfloor 0.0001 1.000000e-04
                -varnorm no no
                -verbose no no
                -warp_params
                -warp_type inverse_linear inverse_linear
                -wbeam 7e-29 7.000000e-29
                -wip 0.65 6.500000e-01
                -wlen 0.025625 2.562500e-02

                INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
                INFO: acmod.c(162): Using subvector specification 0-12/13-25/26-38
                INFO: mdef.c(518): Reading model definition: /usr/share/pocketsphinx/model/en-us/en-us/mdef
                INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
                INFO: bin_mdef.c(336): Reading binary model definition: /usr/share/pocketsphinx/model/en-us/en-us/mdef
                INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
                INFO: tmat.c(149): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/en-us/en-us/transition_matrices
                INFO: acmod.c(113): Attempting to use PTM computation module
                INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/en-us/en-us/means
                INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
                INFO: ms_gauden.c(244): 128x13
                INFO: ms_gauden.c(244): 128x13
                INFO: ms_gauden.c(244): 128x13
                INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/en-us/en-us/variances
                INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
                INFO: ms_gauden.c(244): 128x13
                INFO: ms_gauden.c(244): 128x13
                INFO: ms_gauden.c(244): 128x13
                INFO: ms_gauden.c(304): 222 variance values floored
                INFO: ptm_mgau.c(475): Loading senones from dump file /usr/share/pocketsphinx/model/en-us/en-us/sendump
                INFO: ptm_mgau.c(499): BEGIN FILE FORMAT DESCRIPTION
                INFO: ptm_mgau.c(562): Rows: 128, Columns: 5126
                INFO: ptm_mgau.c(594): Using memory-mapped I/O for senones
                INFO: ptm_mgau.c(837): Maximum top-N: 4
                INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
                INFO: dict.c(320): Allocating 138824 * 20 bytes (2711 KiB) for word entries
                INFO: dict.c(333): Reading main dictionary: /usr/share/pocketsphinx/model/en-us/cmudict-en-us.dict
                INFO: dict.c(213): Dictionary size 134723, allocated 1016 KiB for strings, 1679 KiB for phones
                INFO: dict.c(336): 134723 words read
                INFO: dict.c(358): Reading filler dictionary: /usr/share/pocketsphinx/model/en-us/en-us/noisedict
                INFO: dict.c(213): Dictionary size 134728, allocated 0 KiB for strings, 0 KiB for phones
                INFO: dict.c(361): 5 words read
                INFO: dict2pid.c(396): Building PID tables for dictionary
                INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
                INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
                INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word triphones
                INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
                INFO: ngram_search_fwdtree.c(74): Initializing search tree
                INFO: ngram_search_fwdtree.c(101): 791 unique initial diphones
                INFO: ngram_search_fwdtree.c(186): Creating search channels
                INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 152609
                INFO: ngram_search_fwdtree.c(333): Created 723 root, 152481 non-root channels, 53 single-phone words
                INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
                INFO: continous.c(319): ./cross COMPILED ON: Aug 15 2019, AT: 00:38:05

                 

                Last edit: Pablo Caravaca 2019-08-15
                • Pablo Caravaca

                  Pablo Caravaca - 2019-08-15

                  It actually decode fast enough, as you can see in the output from recording from the microphone, isn't it? I am a bit lost here. Appreciate your answers

                  By reducing the size of buffer (to 1024), I got it working continuously, and by using dict and lm customized with a few words, it seems to be working.... Making progress.... But... Still in_speech is always set as 1....

                  This is the log file for my last attempt:
                  INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/en-us/en-us/feat.params
                  Current configuration:
                  [NAME] [DEFLT] [VALUE]
                  -agc none none
                  -agcthresh 2.0 2.000000e+00
                  -allphone
                  -allphone_ci yes yes
                  -alpha 0.97 9.700000e-01
                  -ascale 20.0 2.000000e+01
                  -aw 1 1
                  -backtrace no no
                  -beam 1e-48 1.000000e-48
                  -bestpath yes yes
                  -bestpathlw 9.5 9.500000e+00
                  -ceplen 13 13
                  -cmn live batch
                  -cmninit 40,3,-1 41.00,-5.29,-0.12,5.09,2.48,-4.07,-1.37,-1.78,-5.08,-2.05,-6.45,-1.42,1.17
                  -compallsen no no
                  -dict /home/pi/modelo_mv/6706.dic
                  -dictcase no no
                  -dither no no
                  -doublebw no no
                  -ds 1 1
                  -fdict
                  -feat 1s_c_d_dd 1s_c_d_dd
                  -featparams
                  -fillprob 1e-8 1.000000e-08
                  -frate 100 100
                  -fsg
                  -fsgusealtpron yes yes
                  -fsgusefiller yes yes
                  -fwdflat yes yes
                  -fwdflatbeam 1e-64 1.000000e-64
                  -fwdflatefwid 4 4
                  -fwdflatlw 8.5 8.500000e+00
                  -fwdflatsfwin 25 25
                  -fwdflatwbeam 7e-29 7.000000e-29
                  -fwdtree yes yes
                  -hmm /usr/share/pocketsphinx/model/en-us/en-us
                  -input_endian little little
                  -jsgf
                  -keyphrase
                  -kws
                  -kws_delay 10 10
                  -kws_plp 1e-1 1.000000e-01
                  -kws_threshold 1e-30 1.000000e-30
                  -latsize 5000 5000
                  -lda
                  -ldadim 0 0
                  -lifter 0 22
                  -lm /home/pi/modelo_mv/6706.lm
                  -lmctl
                  -lmname
                  -logbase 1.0001 1.000100e+00
                  -logfn /home/pi/modelo_mv/log
                  -logspec no no
                  -lowerf 133.33334 1.300000e+02
                  -lpbeam 1e-40 1.000000e-40
                  -lponlybeam 7e-29 7.000000e-29
                  -lw 6.5 6.500000e+00
                  -maxhmmpf 30000 30000
                  -maxwpf -1 -1
                  -mdef
                  -mean
                  -mfclogdir
                  -min_endfr 0 0
                  -mixw
                  -mixwfloor 0.0000001 1.000000e-07
                  -mllr
                  -mmap yes yes
                  -ncep 13 13
                  -nfft 512 512
                  -nfilt 40 25
                  -nwpen 1.0 1.000000e+00
                  -pbeam 1e-48 1.000000e-48
                  -pip 1.0 1.000000e+00
                  -pl_beam 1e-10 1.000000e-10
                  -pl_pbeam 1e-10 1.000000e-10
                  -pl_pip 1.0 1.000000e+00
                  -pl_weight 3.0 3.000000e+00
                  -pl_window 5 5
                  -rawlogdir
                  -remove_dc no no
                  -remove_noise yes yes
                  -remove_silence yes yes
                  -round_filters yes yes
                  -samprate 16000 1.600000e+04
                  -seed -1 -1
                  -sendump
                  -senlogdir
                  -senmgau
                  -silprob 0.005 5.000000e-03
                  -smoothspec no no
                  -svspec 0-12/13-25/26-38
                  -tmat
                  -tmatfloor 0.0001 1.000000e-04
                  -topn 4 4
                  -topn_beam 0 0
                  -toprule
                  -transform legacy dct
                  -unit_area yes yes
                  -upperf 6855.4976 6.800000e+03
                  -uw 1.0 1.000000e+00
                  -vad_postspeech 50 50
                  -vad_prespeech 20 20
                  -vad_startspeech 10 10
                  -vad_threshold 3.0 3.000000e+00
                  -var
                  -varfloor 0.0001 1.000000e-04
                  -varnorm no no
                  -verbose no no
                  -warp_params
                  -warp_type inverse_linear inverse_linear
                  -wbeam 7e-29 7.000000e-29
                  -wip 0.65 6.500000e-01
                  -wlen 0.025625 2.562500e-02

                  INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
                  INFO: acmod.c(162): Using subvector specification 0-12/13-25/26-38
                  INFO: mdef.c(518): Reading model definition: /usr/share/pocketsphinx/model/en-us/en-us/mdef
                  INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
                  INFO: bin_mdef.c(336): Reading binary model definition: /usr/share/pocketsphinx/model/en-us/en-us/mdef
                  INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
                  INFO: tmat.c(149): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/en-us/en-us/transition_matrices
                  INFO: acmod.c(113): Attempting to use PTM computation module
                  INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/en-us/en-us/means
                  INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
                  INFO: ms_gauden.c(244): 128x13
                  INFO: ms_gauden.c(244): 128x13
                  INFO: ms_gauden.c(244): 128x13
                  INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/en-us/en-us/variances
                  INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
                  INFO: ms_gauden.c(244): 128x13
                  INFO: ms_gauden.c(244): 128x13
                  INFO: ms_gauden.c(244): 128x13
                  INFO: ms_gauden.c(304): 222 variance values floored
                  INFO: ptm_mgau.c(475): Loading senones from dump file /usr/share/pocketsphinx/model/en-us/en-us/sendump
                  INFO: ptm_mgau.c(499): BEGIN FILE FORMAT DESCRIPTION
                  INFO: ptm_mgau.c(562): Rows: 128, Columns: 5126
                  INFO: ptm_mgau.c(594): Using memory-mapped I/O for senones
                  INFO: ptm_mgau.c(837): Maximum top-N: 4
                  INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
                  INFO: dict.c(320): Allocating 4123 * 20 bytes (80 KiB) for word entries
                  INFO: dict.c(333): Reading main dictionary: /home/pi/modelo_mv/6706.dic
                  INFO: dict.c(213): Dictionary size 22, allocated 0 KiB for strings, 0 KiB for phones
                  INFO: dict.c(336): 22 words read
                  INFO: dict.c(358): Reading filler dictionary: /usr/share/pocketsphinx/model/en-us/en-us/noisedict
                  INFO: dict.c(213): Dictionary size 27, allocated 0 KiB for strings, 0 KiB for phones
                  INFO: dict.c(361): 5 words read
                  INFO: dict2pid.c(396): Building PID tables for dictionary
                  INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
                  INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
                  INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word triphones
                  INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
                  INFO: ngram_model_trie.c(365): Header doesn't match
                  INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
                  INFO: ngram_model_trie.c(193): LM of order 3
                  INFO: ngram_model_trie.c(195): #1-grams: 21
                  INFO: ngram_model_trie.c(195): #2-grams: 39
                  INFO: ngram_model_trie.c(195): #3-grams: 39
                  INFO: lm_trie.c(474): Training quantizer
                  INFO: lm_trie.c(482): Building LM trie
                  INFO: ngram_search_fwdtree.c(74): Initializing search tree
                  INFO: ngram_search_fwdtree.c(101): 20 unique initial diphones
                  INFO: ngram_search_fwdtree.c(186): Creating search channels
                  INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 175
                  INFO: ngram_search_fwdtree.c(333): Created 20 root, 47 non-root channels, 5 single-phone words
                  INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
                  INFO: cmn_live.c(88): Update from < 41.00 -5.29 -0.12 5.09 2.48 -4.07 -1.37 -1.78 -5.08 -2.05 -6.45 -1.42 1.17 >
                  INFO: cmn_live.c(105): Update to < 67.99 -11.31 -6.89 -6.54 -0.91 -0.01 -1.91 -6.54 -0.47 -5.15 -4.71 -2.68 0.13 >
                  INFO: cmn_live.c(88): Update from < 67.99 -11.31 -6.89 -6.54 -0.91 -0.01 -1.91 -6.54 -0.47 -5.15 -4.71 -2.68 0.13 >
                  INFO: cmn_live.c(105): Update to < 68.23 -11.15 -6.47 -6.57 -0.67 -0.20 -1.93 -6.66 -0.56 -4.97 -4.78 -2.83 -0.09 >

                   

                  Last edit: Pablo Caravaca 2019-08-15
  • Pablo Caravaca

    Pablo Caravaca - 2019-08-15

    Welll, I got the continue.c working with a smaller model, and a smaller buffer.
    The buffer is receiving ok and functions are executing on time and the audio input is surely getting sampled at 16 Khz, 16bit, little endian. However, I am not recognizing any word (in_speech always 0), what is strange, as the code is practically the same.

    I'm doubting about config struct, but seems ok to me....

    Here is the code:

    #include <stdio.h>
    #include <string.h>
    #include <pocketsphinx/pocketsphinx.h>
    #include <sphinxbase/ad.h>
    #include <sphinxbase/err.h>
    #include "CBuffer/cbuffer.h"
    #include <time.h>
    
    extern "C" cb_t q;
    
    static const arg_t cont_args_def[] = {
        POCKETSPHINX_OPTIONS,
        /* Argument file. */
        {"-argfile", ARG_STRING, NULL, "Argument file giving extra arguments."},
        {"-adcdev", ARG_STRING, NULL, "Name of audio device to use for input."},
        {"-infile", ARG_STRING, NULL, "Audio file to transcribe."},
        {"-inmic", ARG_BOOLEAN, "no", "Transcribe audio from microphone."},
        {"-time", ARG_BOOLEAN, "no", "Print word times in file transcription."},
        CMDLN_EMPTY_OPTION};
    
    const char * recognize_from_microphone();
    
    ps_decoder_t *ps;       //  Decoder structure
    cmd_ln_t *config;       //  Configuration for decoder
    
    uint8 utt_started;      //  Flags for utterance started and speech is producing
    int32 k;                           //   Number of frames in audio buffer
    char const *hyp;                   //   Hypothesis for given speech
    char const *decoded_speech;
    
    void *t_sphinx (void *arg) {
    
        config = cmd_ln_init(NULL,
                cont_args_def, TRUE,
                "-lm", "/home/pi/modelo_mv/6706.lm",
                "-dict", "/home/pi/modelo_mv/6706.dic",
                "-kws_threshold", "1e-20",
                "-keyphrase", "MATRIX",
                NULL);
    
        ps_default_search_args(config);
        ps = ps_init(config);                                                        // initialize the pocketsphinx decoder
    
        while(1){
            decoded_speech = recognize_from_microphone();                 // call the function to capture and decode speech
            printf("You Said: %s\n", decoded_speech);                               // send decoded speech to screen
        }
    
        ps_free(ps);
    }
    
    static void sleep_msec(int32 ms) {
        struct timeval tmo;
        tmo.tv_sec = 0;
        tmo.tv_usec = ms * 1000;
    
        select(0, NULL, NULL, NULL, &tmo);
    }
    
    const char * recognize_from_microphone(){
        int16_t buffer[2046];
        if (ps_start_utt(ps) < 0) E_FATAL("Failed to start utterance\n");
        utt_started = FALSE;
        printf("Ready...\n");
    
        clock_t start, end;
        double diff_t;
        int k,j;
    
        while(1) {
            k = cb_get(&q, buffer, 2046);   // capture the number of frames in the audio buffer
    
            start = clock();
            ps_process_raw(ps, (const int16_t *)buffer, k, FALSE, FALSE);  // send the audio buffer to the pocketsphinx decoder
            const int8_t in_speech = ps_get_in_speech(ps);            // test to see if speech is being detected
            end = clock();
    
            if (j ==40){
                diff_t = ((double) (end - start)) / CLOCKS_PER_SEC;
                printf("%d\t",in_speech);
                printf("%f\n",diff_t);
                j=0;
            }
            else
                j++;
    
            if (in_speech && !utt_started) {             // if speech has started and utt_started flag is false
                utt_started = TRUE;                      // then set the flag
                printf("Listening...\n");
            }
    
            if (!in_speech && utt_started) {             // if speech has ended and the utt_started flag is true
                ps_end_utt(ps);                          // then mark the end of the utterance
                hyp = ps_get_hyp(ps, NULL );             // query pocketsphinx for "hypothesis" of decoded statement
                return hyp;                              // the function returns the hypothesis
                break;                               // exit the while loop and return to main
    
            }
            sleep_msec(10);
        }
    }
    
     

    Last edit: Pablo Caravaca 2019-08-15

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.