Menu

Pocketsphinx: no recognition in live mode

Help
2008-11-25
2012-09-22
  • Michael Bluemcke

    I'm trying to adapt pocketsphinx 0.5 to a linux based smartphone (ARM device, INTEL PXA 255, ARMv5TE architecure, actually OSS device but will change to ALSA the next time). Compiling is passed but in fact, I get no senseful recognition in continous/live mode but pretty good results in batch mode even with my own samples created on the device via integrated microphone...

    Continous Live mode:
    No recognition even with tidigits number test having only a very small dictionary. Same with all other live tests...

    Batch mode:
    I have adapted the regression tests and they are passed without error. Even more if I record some voice notes like numbers (format 8 kHz, 16Bit) with internal voice note application on the smartphone using the integrated microphone. and use them in batch mode the recognition of my own samples are also pretty good. But not in live mode.

    Any information about how to set the parameters for pocketsphinx_continous correctly or how I can identify / debug the problem (switch on debug logs, change parameters, create output files for analysis, ...) would be helpful.

     
    • Nickolay V. Shmyrev

      probably you need to look in the log for proper cmninit value. It's the first value in cmninit line:

      INFO: cmn_prior.c(139): cmn_prior_update: to < 9.91 -0.20 0.03 -0.18 -0.10 -0.23 0.04 -0.24 -0.20 -0.14 -0.19 0.02 -0.21 >

      here you need cmninit 9.91 instead of default 8.0

      Paste the recognition log from batch and continuous mode, it would be easier to help

       
      • Michael Bluemcke

        here you are with logs...

        One additional note: I have got error message "can't set input gain/recording level for this device" and therefore I added the following lines to ad_oss.c to set mic recording level for the device. However the results are the same. I'm not sure whether its really connected to the problem.

            else if (devMask &amp; SOUND_MASK_MIC) {
                if (ioctl(mixerFD, SOUND_MIXER_WRITE_MIC, &amp;inputGain) &lt; 0) {
                    fprintf(stderr,
                            &quot;%s %d: mic record level to %d: %s\n&quot;,
                            __FILE__, __LINE__, inputGain, strerror(errno));
                    exit(1);
                }
            }
        

        Now the logs

        log from batch mode: -------------------------------------------------------------------------------------------------
        test sample were "1" and "1 2 3 4 5 6 7 8 9"

        INFO: cmd_ln.c(459): Parsing command line:
        /usr/bin/pocketsphinx_batch \
        -hmm /usr/share/pocketsphinx/model/hmm/tidigits \
        -lm /usr/share/pocketsphinx/model/lm/tidigits/tidigits.lm \
        -dict /usr/share/pocketsphinx/model/lm/tidigits/tidigits.dic \
        -ctl ./test.ctl \
        -adcin yes \
        -adchdr 44 \
        -input_endian little \
        -cepdir /home/root/Documents/audio/x-wav \
        -samprate 8000 \
        -agc max \
        -cepext .wav \
        -hyp /home/root/test-wsj1-simple2.match

        Current configuration:
        [NAME] [DEFLT] [VALUE]
        -adchdr 0 44
        -adcin no yes
        -agc none max
        -agcthresh 2.0 2.000000e+00
        -alpha 0.97 9.700000e-01
        -ascale 20.0 2.000000e+01
        -backtrace no no
        -beam 1e-48 1.000000e-48
        -bestpath yes yes
        -bestpathlw 9.5 9.500000e+00
        -cep2spec no no
        -cepdir /home/root/Documents/audio/x-wav
        -cepext .mfc .wav
        -ceplen 13 13
        -cmn current current
        -cmninit 8.0 8.0
        -compallsen no no
        -ctl ./test.ctl
        -ctlcount -1 -1
        -ctlincr 1 1
        -ctloffset 0 0
        -dict /usr/share/pocketsphinx/model/lm/tidigits/tidigits.dic
        -dictcase no no
        -dither no no
        -doublebw no no
        -ds 1 1
        -fdict
        -feat 1s_c_d_dd 1s_c_d_dd
        -featparams
        -fillprob 1e-8 1.000000e-08
        -frate 100 100
        -fsg
        -fsgusealtpron yes yes
        -fsgusefiller yes yes
        -fwdflat yes yes
        -fwdflatbeam 1e-64 1.000000e-64
        -fwdflatefwid 4 4
        -fwdflatlw 8.5 8.500000e+00
        -fwdflatsfwin 25 25
        -fwdflatwbeam 7e-29 7.000000e-29
        -fwdtree yes yes
        -hmm /usr/share/pocketsphinx/model/hmm/tidigits
        -hyp /home/root/test-wsj1-simple2.match
        -hypseg
        -input_endian little little
        -jsgf
        -kdmaxbbi -1 -1
        -kdmaxdepth 0 0
        -kdtree
        -latsize 5000 5000
        -lda
        -ldadim 0 0
        -lifter 0 0
        -lm /usr/share/pocketsphinx/model/lm/tidigits/tidigits.lm
        -lmctl
        -lmname default default
        -logbase 1.0001 1.000100e+00
        -logfn
        -logspec no no
        -lowerf 133.33334 1.333333e+02
        -lpbeam 1e-40 1.000000e-40
        -lponlybeam 7e-29 7.000000e-29
        -lw 6.5 6.500000e+00
        -maxhistpf 100 100
        -maxhmmpf -1 -1
        -maxnewoov 20 20
        -maxwpf -1 -1
        -mdef
        -mean
        -mixw
        -mixwfloor 0.0000001 1.000000e-07
        -mmap yes yes
        -nbest 0 0
        -nbestdir
        -nbestext .hyp .hyp
        -ncep 13 13
        -nfft 512 512
        -nfilt 40 40
        -nwpen 1.0 1.000000e+00
        -outlatdir
        -pbeam 1e-48 1.000000e-48
        -pip 1.0 1.000000e+00
        -remove_dc no no
        -round_filters yes yes
        -samprate 16000 8.000000e+03
        -sdmap
        -seed -1 -1
        -sendump
        -silprob 0.005 5.000000e-03
        -smoothspec no no
        -spec2cep no no
        -svspec
        -tmat
        -tmatfloor 0.0001 1.000000e-04
        -topn 4 4
        -toprule
        -transform legacy legacy
        -unit_area yes yes
        -upperf 6855.4976 6.855498e+03
        -usewdphones no no
        -uw 1.0 1.000000e+00
        -var
        -varfloor 0.0001 1.000000e-04
        -varnorm no no
        -verbose no no
        -warp_params
        -warp_type inverse_linear inverse_linear
        -wbeam 7e-29 7.000000e-29
        -wip 0.65 6.500000e-01
        -wlen 0.025625 2.562500e-02

        INFO: cmd_ln.c(459): Parsing command line:
        \
        -lowerf 1 \
        -upperf 4000 \
        -nfilt 20 \
        -transform dct \
        -round_filters no \
        -remove_dc yes \
        -wlen 0.025 \
        -feat s2_4x \
        -cmn current \
        -varnorm no

        Current configuration:
        [NAME] [DEFLT] [VALUE]
        -agc none max
        -agcthresh 2.0 2.000000e+00
        -alpha 0.97 9.700000e-01
        -cep2spec no no
        -ceplen 13 13
        -cmn current current
        -cmninit 8.0 8.0
        -dither no no
        -doublebw no no
        -feat 1s_c_d_dd s2_4x
        -frate 100 100
        -input_endian little little
        -lda
        -ldadim 0 0
        -lifter 0 0
        -logspec no no
        -lowerf 133.33334 1.000000e+00
        -ncep 13 13
        -nfft 512 512
        -nfilt 40 20
        -remove_dc no yes
        -round_filters yes no
        -samprate 16000 8.000000e+03
        -seed -1 -1
        -smoothspec no no
        -spec2cep no no
        -svspec
        -transform legacy dct
        -unit_area yes yes
        -upperf 6855.4976 4.000000e+03
        -varnorm no no
        -verbose no no
        -warp_params
        -warp_type inverse_linear inverse_linear
        -wlen 0.025625 2.500000e-02

        INFO: acmod.c(76): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/hmm/tidigits/feat.params
        INFO: mdef.c(520): Reading model definition: /usr/share/pocketsphinx/model/hmm/tidigits/mdef
        INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
        INFO: bin_mdef.c(301): Reading binary model definition: /usr/share/pocketsphinx/model/hmm/tidigits/mdef
        INFO: bin_mdef.c(480): 34 CI-phone, 396 CD-phone, 5 emitstate/phone, 170 CI-sen, 670 Sen, 222 Sen-Seq
        INFO: tmat.c(204): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/hmm/tidigits/transition_matrices
        INFO: acmod.c(108): Attempting to use SCGMM computation module
        INFO: s2_semi_mgau.c(985): Reading S3 mixture gaussian file '/usr/share/pocketsphinx/model/hmm/tidigits/means'
        INFO: s2_semi_mgau.c(1084): 1 mixture Gaussians, 256 components, 4 feature streams, veclen 51
        INFO: s2_semi_mgau.c(985): Reading S3 mixture gaussian file '/usr/share/pocketsphinx/model/hmm/tidigits/variances'
        INFO: s2_semi_mgau.c(1084): 1 mixture Gaussians, 256 components, 4 feature streams, veclen 51
        INFO: s2_semi_mgau.c(748): Loading senones from dump file /usr/share/pocketsphinx/model/hmm/tidigits/sendump
        INFO: s2_semi_mgau.c(768): BEGIN FILE FORMAT DESCRIPTION
        INFO: s2_semi_mgau.c(797): Rows: 256, Columns: 672
        INFO: s2_semi_mgau.c(805): Using memory-mapped I/O for senones
        INFO: feat.c(849): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='current', VARNORM='no', AGC='max'
        INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
        INFO: agc.c(132): AGCEMax: max= 5.00
        INFO: dict.c(232): Allocating 20 placeholders for new OOVs
        INFO: dict.c(494): 11 = words in file [/usr/share/pocketsphinx/model/lm/tidigits/tidigits.dic]
        INFO: dict.c(349): LEFT CONTEXT TABLES
        INFO: dict.c(1013): Entry Context table contains
        12 entries
        INFO: dict.c(1014): 408 possible cross word triphones.
        INFO: dict.c(1052): 132 triphones
        242 pseudo diphones
        34 uniphones
        INFO: dict.c(1099): Exit Context table contains
        12 entries
        INFO: dict.c(1100): 408 possible cross word triphones.
        INFO: dict.c(1166): 132 triphones
        242 pseudo diphones
        34 uniphones
        INFO: dict.c(1168): 79 right context entries
        INFO: dict.c(1169): 6 ave entries per exit context
        INFO: dict.c(355): RIGHT CONTEXT TABLES
        INFO: dict.c(1013): Entry Context table contains
        12 entries
        INFO: dict.c(1014): 408 possible cross word triphones.
        INFO: dict.c(1052): 132 triphones
        242 pseudo diphones
        34 uniphones
        INFO: dict.c(1099): Exit Context table contains
        12 entries
        INFO: dict.c(1100): 408 possible cross word triphones.
        INFO: dict.c(1166): 132 triphones
        242 pseudo diphones
        34 uniphones
        INFO: dict.c(1168): 76 right context entries
        INFO: dict.c(1169): 6 ave entries per exit context
        INFO: ngram_model_arpa.c(539): ngrams 1=14, 2=1, 3=0
        INFO: ngram_model_arpa.c(204): Reading unigrams
        INFO: ngram_model_arpa.c(578): 14 = #unigrams created
        INFO: ngram_model_arpa.c(260): Reading bigrams
        INFO: ngram_model_arpa.c(594): 1 = #bigrams created
        INFO: ngram_model_arpa.c(595): 2 = #prob2 entries
        INFO: ngram_search_fwdtree.c(156): 0 root, 0 non-root channels, 24 single-phone words
        INFO: ngram_search_fwdtree.c(195): Creating search tree
        INFO: ngram_search_fwdtree.c(203): 0 root, 0 non-root channels, 24 single-phone words
        INFO: ngram_search_fwdtree.c(325): max nonroot chan increased to 140
        INFO: ngram_search_fwdtree.c(334): 10 root, 12 non-root channels, 4 single-phone words
        INFO: ngram_search_fwdflat.c(95): fwdflat: min_ef_width = 4, max_sf_win = 25
        INFO: cmn.c(175): CMN: 44.94 -0.47 -2.19 1.29 -2.34 -0.72 -1.01 -1.06 -1.15 -0.08 -0.65 0.12 -0.23
        INFO: agc.c(123): AGCMax: obs=max= nan
        INFO: ngram_search_fwdtree.c(1471): 813 words recognized (2/fr)
        INFO: ngram_search_fwdtree.c(1473): 36403 senones evaluated (102/fr)
        INFO: ngram_search_fwdtree.c(1475): 7797 channels searched (21/fr), 3537 1st, 1535 last
        INFO: ngram_search_fwdtree.c(1479): 1202 words for which last channels evaluated (3/fr)
        INFO: ngram_search_fwdtree.c(1482): 683 candidate words for entering last phone (1/fr)
        INFO: ngram_search_fwdflat.c(829): 715 words recognized (2/fr)
        INFO: ngram_search_fwdflat.c(831): 6192 senones evaluated (17/fr)
        INFO: ngram_search_fwdflat.c(833): 1804 channels searched (5/fr)
        INFO: ngram_search_fwdflat.c(835): 977 words searched (2/fr)
        INFO: ngram_search_fwdflat.c(837): 200 word transitions (0/fr)
        INFO: ngram_search.c(1007): lattice start node <s>.0 end node </s>.95
        INFO: batch.c(329): ONE (test1 -58026608)
        INFO: batch.c(340): test1: 3.57 seconds speech, 1.06 seconds CPU, 1.07 seconds wall
        INFO: batch.c(342): test1: 0.30 xRT (CPU), 0.30 xRT (elapsed)
        INFO: cmn.c(175): CMN: 47.28 2.64 -0.64 0.39 -2.32 -0.38 -0.86 -0.68 -0.69 -0.31 -0.15 -0.05 -0.27
        INFO: agc.c(123): AGCMax: obs=max= nan
        INFO: ngram_search_fwdtree.c(1471): 2158 words recognized (3/fr)
        INFO: ngram_search_fwdtree.c(1473): 82645 senones evaluated (111/fr)
        INFO: ngram_search_fwdtree.c(1475): 18228 channels searched (24/fr), 7208 1st, 5502 last
        INFO: ngram_search_fwdtree.c(1479): 3078 words for which last channels evaluated (4/fr)
        INFO: ngram_search_fwdtree.c(1482): 1367 candidate words for entering last phone (1/fr)
        INFO: ngram_search_fwdflat.c(829): 1586 words recognized (2/fr)
        INFO: ngram_search_fwdflat.c(831): 32919 senones evaluated (44/fr)
        INFO: ngram_search_fwdflat.c(833): 8621 channels searched (11/fr)
        INFO: ngram_search_fwdflat.c(835): 2724 words searched (3/fr)
        INFO: ngram_search_fwdflat.c(837): 1131 word transitions (1/fr)
        INFO: ngram_search.c(1007): lattice start node <s>.0 end node </s>.609
        INFO: batch.c(329): ONE NINE THREE FOUR FIVE EIGHT EIGHT NINE (test2 -141147703)
        INFO: batch.c(340): test2: 7.41 seconds speech, 2.12 seconds CPU, 2.23 seconds wall
        INFO: batch.c(342): test2: 0.29 xRT (CPU), 0.30 xRT (elapsed)
        INFO: batch.c(350): TOTAL 10.98 seconds speech, 3.18 seconds CPU, 3.29 seconds wall
        INFO: batch.c(352): AVERAGE 0.29 xRT (CPU), 0.30 xRT (elapsed)

        log from continous mode: --------------------------------------------------------------------------------------------
        test sample "1 2 3 4"

        pocketsphinx_tidigits:
        Demo CMU PocketSphinx decoder with connected digit recognition.

        <executing /usr/bin/pocketsphinx_continuous, please wait>
        INFO: cmd_ln.c(459): Parsing command line:
        /usr/bin/pocketsphinx_continuous \
        -adcdev /dev/dsp \
        -lm /usr/share/pocketsphinx/model/lm/tidigits/tidigits.lm \
        -dict /usr/share/pocketsphinx/model/lm/tidigits/tidigits.dic \
        -hmm /usr/share/pocketsphinx/model/hmm/tidigits \
        -samprate 16000 \
        -dither no \
        -agc none \
        -input_endian little

        Current configuration:
        [NAME] [DEFLT] [VALUE]
        -adcdev /dev/dsp
        -agc none none
        -agcthresh 2.0 2.000000e+00
        -alpha 0.97 9.700000e-01
        -ascale 20.0 2.000000e+01
        -backtrace no no
        -beam 1e-48 1.000000e-48
        -bestpath yes yes
        -bestpathlw 9.5 9.500000e+00
        -cep2spec no no
        -ceplen 13 13
        -cmn current current
        -cmninit 8.0 8.0
        -compallsen no no
        -dict /usr/share/pocketsphinx/model/lm/tidigits/tidigits.dic
        -dictcase no no
        -dither no no
        -doublebw no no
        -ds 1 1
        -fdict
        -feat 1s_c_d_dd 1s_c_d_dd
        -featparams
        -fillprob 1e-8 1.000000e-08
        -frate 100 100
        -fsg
        -fsgusealtpron yes yes
        -fsgusefiller yes yes
        -fwdflat yes yes
        -fwdflatbeam 1e-64 1.000000e-64
        -fwdflatefwid 4 4
        -fwdflatlw 8.5 8.500000e+00
        -fwdflatsfwin 25 25
        -fwdflatwbeam 7e-29 7.000000e-29
        -fwdtree yes yes
        -hmm /usr/share/pocketsphinx/model/hmm/tidigits
        -input_endian little little
        -jsgf
        -kdmaxbbi -1 -1
        -kdmaxdepth 0 0
        -kdtree
        -latsize 5000 5000
        -lda
        -ldadim 0 0
        -lifter 0 0
        -lm /usr/share/pocketsphinx/model/lm/tidigits/tidigits.lm
        -lmctl
        -lmname default default
        -logbase 1.0001 1.000100e+00
        -logspec no no
        -lowerf 133.33334 1.333333e+02
        -lpbeam 1e-40 1.000000e-40
        -lponlybeam 7e-29 7.000000e-29
        -lw 6.5 6.500000e+00
        -maxhistpf 100 100
        -maxhmmpf -1 -1
        -maxnewoov 20 20
        -maxwpf -1 -1
        -mdef
        -mean
        -mixw
        -mixwfloor 0.0000001 1.000000e-07
        -mmap yes yes
        -ncep 13 13
        -nfft 512 512
        -nfilt 40 40
        -nwpen 1.0 1.000000e+00
        -pbeam 1e-48 1.000000e-48
        -pip 1.0 1.000000e+00
        -remove_dc no no
        -round_filters yes yes
        -samprate 16000 1.600000e+04
        -sdmap
        -seed -1 -1
        -sendump
        -silprob 0.005 5.000000e-03
        -smoothspec no no
        -spec2cep no no
        -svspec
        -tmat
        -tmatfloor 0.0001 1.000000e-04
        -topn 4 4
        -toprule
        -transform legacy legacy
        -unit_area yes yes
        -upperf 6855.4976 6.855498e+03
        -usewdphones no no
        -uw 1.0 1.000000e+00
        -var
        -varfloor 0.0001 1.000000e-04
        -varnorm no no
        -verbose no no
        -warp_params
        -warp_type inverse_linear inverse_linear
        -wbeam 7e-29 7.000000e-29
        -wip 0.65 6.500000e-01
        -wlen 0.025625 2.562500e-02

        INFO: cmd_ln.c(459): Parsing command line:
        \
        -lowerf 1 \
        -upperf 4000 \
        -nfilt 20 \
        -transform dct \
        -round_filters no \
        -remove_dc yes \
        -wlen 0.025 \
        -feat s2_4x \
        -cmn current \
        -varnorm no

        Current configuration:
        [NAME] [DEFLT] [VALUE]
        -agc none none
        -agcthresh 2.0 2.000000e+00
        -alpha 0.97 9.700000e-01
        -cep2spec no no
        -ceplen 13 13
        -cmn current current
        -cmninit 8.0 8.0
        -dither no no
        -doublebw no no
        -feat 1s_c_d_dd s2_4x
        -frate 100 100
        -input_endian little little
        -lda
        -ldadim 0 0
        -lifter 0 0
        -logspec no no
        -lowerf 133.33334 1.000000e+00
        -ncep 13 13
        -nfft 512 512
        -nfilt 40 20
        -remove_dc no yes
        -round_filters yes no
        -samprate 16000 1.600000e+04
        -seed -1 -1
        -smoothspec no no
        -spec2cep no no
        -svspec
        -transform legacy dct
        -unit_area yes yes
        -upperf 6855.4976 4.000000e+03
        -varnorm no no
        -verbose no no
        -warp_params
        -warp_type inverse_linear inverse_linear
        -wlen 0.025625 2.500000e-02

        INFO: acmod.c(76): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/hmm/tidigits/feat.params
        INFO: mdef.c(520): Reading model definition: /usr/share/pocketsphinx/model/hmm/tidigits/mdef
        INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
        INFO: bin_mdef.c(301): Reading binary model definition: /usr/share/pocketsphinx/model/hmm/tidigits/mdef
        INFO: bin_mdef.c(480): 34 CI-phone, 396 CD-phone, 5 emitstate/phone, 170 CI-sen, 670 Sen, 222 Sen-Seq
        INFO: tmat.c(204): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/hmm/tidigits/transition_matrices
        INFO: acmod.c(108): Attempting to use SCGMM computation module
        INFO: s2_semi_mgau.c(985): Reading S3 mixture gaussian file '/usr/share/pocketsphinx/model/hmm/tidigits/means'
        INFO: s2_semi_mgau.c(1084): 1 mixture Gaussians, 256 components, 4 feature streams, veclen 51
        INFO: s2_semi_mgau.c(985): Reading S3 mixture gaussian file '/usr/share/pocketsphinx/model/hmm/tidigits/variances'
        INFO: s2_semi_mgau.c(1084): 1 mixture Gaussians, 256 components, 4 feature streams, veclen 51
        INFO: s2_semi_mgau.c(748): Loading senones from dump file /usr/share/pocketsphinx/model/hmm/tidigits/sendump
        INFO: s2_semi_mgau.c(768): BEGIN FILE FORMAT DESCRIPTION
        INFO: s2_semi_mgau.c(797): Rows: 256, Columns: 672
        INFO: s2_semi_mgau.c(805): Using memory-mapped I/O for senones
        INFO: feat.c(849): Initializing feature stream to type: 's2_4x', ceplen=13, CMN='current', VARNORM='no', AGC='none'
        INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
        INFO: dict.c(232): Allocating 20 placeholders for new OOVs
        INFO: dict.c(494): 11 = words in file [/usr/share/pocketsphinx/model/lm/tidigits/tidigits.dic]
        INFO: dict.c(349): LEFT CONTEXT TABLES
        INFO: dict.c(1013): Entry Context table contains
        12 entries
        INFO: dict.c(1014): 408 possible cross word triphones.
        INFO: dict.c(1052): 132 triphones
        242 pseudo diphones
        34 uniphones
        INFO: dict.c(1099): Exit Context table contains
        12 entries
        INFO: dict.c(1100): 408 possible cross word triphones.
        INFO: dict.c(1166): 132 triphones
        242 pseudo diphones
        34 uniphones
        INFO: dict.c(1168): 79 right context entries
        INFO: dict.c(1169): 6 ave entries per exit context
        INFO: dict.c(355): RIGHT CONTEXT TABLES
        INFO: dict.c(1013): Entry Context table contains
        12 entries
        INFO: dict.c(1014): 408 possible cross word triphones.
        INFO: dict.c(1052): 132 triphones
        242 pseudo diphones
        34 uniphones
        INFO: dict.c(1099): Exit Context table contains
        12 entries
        INFO: dict.c(1100): 408 possible cross word triphones.
        INFO: dict.c(1166): 132 triphones
        242 pseudo diphones
        34 uniphones
        INFO: dict.c(1168): 76 right context entries
        INFO: dict.c(1169): 6 ave entries per exit context
        INFO: ngram_model_arpa.c(539): ngrams 1=14, 2=1, 3=0
        INFO: ngram_model_arpa.c(204): Reading unigrams
        INFO: ngram_model_arpa.c(578): 14 = #unigrams created
        INFO: ngram_model_arpa.c(260): Reading bigrams
        INFO: ngram_model_arpa.c(594): 1 = #bigrams created
        INFO: ngram_model_arpa.c(595): 2 = #prob2 entries
        INFO: ngram_search_fwdtree.c(156): 0 root, 0 non-root channels, 24 single-phone words
        INFO: ngram_search_fwdtree.c(195): Creating search tree
        INFO: ngram_search_fwdtree.c(203): 0 root, 0 non-root channels, 24 single-phone words
        INFO: ngram_search_fwdtree.c(325): max nonroot chan increased to 140
        INFO: ngram_search_fwdtree.c(334): 10 root, 12 non-root channels, 4 single-phone words
        INFO: ngram_search_fwdflat.c(95): fwdflat: min_ef_width = 4, max_sf_win = 25
        INFO: continuous.c(244): /usr/bin/pocketsphinx_continuous COMPILED ON: Nov 26 2008, AT: 11:14:15

        DSP Revision 0:
        DSP has duplex capability.
        DSP has real time capability.
        DSP does not have batch capability.
        DSP does not have coprocessor capability.
        DSP has trigger capability.
        DSP has memory map capability.
        READY....
        Listening...
        Stopped listening, please wait...
        INFO: cmn_prior.c(121): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
        INFO: cmn_prior.c(139): cmn_prior_update: to < 18.30 4.54 0.72 0.27 2.26 0.22 -1.19 -1.07 -0.79 0.26 -0.80 -0.17 0.27 >
        INFO: ngram_search_fwdtree.c(1471): 1117 words recognized (3/fr)
        INFO: ngram_search_fwdtree.c(1473): 52109 senones evaluated (132/fr)
        INFO: ngram_search_fwdtree.c(1475): 12165 channels searched (30/fr), 3575 1st, 5705 last
        INFO: ngram_search_fwdtree.c(1479): 1960 words for which last channels evaluated (4/fr)
        INFO: ngram_search_fwdtree.c(1482): 1056 candidate words for entering last phone (2/fr)
        INFO: ngram_search_fwdflat.c(829): 848 words recognized (2/fr)
        INFO: ngram_search_fwdflat.c(831): 40377 senones evaluated (102/fr)
        INFO: ngram_search_fwdflat.c(833): 10170 channels searched (25/fr)
        INFO: ngram_search_fwdflat.c(835): 2100 words searched (5/fr)
        INFO: ngram_search_fwdflat.c(837): 1077 word transitions (2/fr)
        WARNING: "ngram_search.c", line 965: </s> not found in last frame, using FOUR instead
        INFO: ngram_search.c(1007): lattice start node <s>.0 end node FOUR.330
        000000000: ZERO TWO SIX ONE OH FOUR (-81751999)
        READY....

         
        • Nickolay V. Shmyrev

          So please try with -cmninit 47.2

           
          • Michael Bluemcke

            With cmminit the its maybe a bit better.
            For "6" I can say I have reproducable results now. The results for the rest of numbers are still random...
            One small addition to the results of the batch mode test (with agc none I have 100% recoginition with agc max its about 80%).
            So, its maybe the problem of setting the right parameters to the right values to achieve acceptabel recognition results. My bigger problem is that I'm new to ASR / pocketsphinx / sphinx and not really familar with all its functions and parameters...
            I have started to take a closer look to the documentation but I'm not sure whether it will help me to solve it. So, any information to unerstand the parameters and its relation or what steps I have to do or what I should read to get it working could be helpful ;o)

             

Log in to post a comment.