Menu

Wierd pocketsphinx accuracy degredation.

Help
Jon
2012-08-07
2012-09-22
  • Jon

    Jon - 2012-08-07

    Pocketsphinx is producing a bizare issue where the accuracy appears to be
    degrading after only a couple of queries to the engine. The first query has
    near perfect accuracy - it can recognize relatively complicated and convoluted
    phrases without dificulty. However the second, and third recognitions can
    barely pick up a two sylable word, and by the forth query to the engine, it
    simply fails to generate a hypothesis.

    For context I'm creating a pocketsphinx application for android, using jsgf
    grammars (though the problem persists with fsg grammars as well). My code is
    based on the pocketsphinx demo for android
    http://cmusphinx.sourceforge.net/2011/05/building-pocketsphinx-on-
    android/
    .

    I'm not quite sure whats causing the problem, the original demo appeared to
    work fine, and I have hardly altered the configuration of the speech engine,
    other than using jsgf grammars.

    Anyways, any one else ever experience something like this, or even have a
    sugestion as to what I could try to remedy this?

    Thanks

     
  • Nickolay V. Shmyrev

    Provide pocketsphinx log which is created on the device

    Add

    -rawlogdir <somefolder>
    

    option to pocketsphinx initialization and collect the audio you are trying to
    recognize. Share the audio, maybe it's corrupted somehow.

     
  • Jon

    Jon - 2012-08-08

    Heres the raw data files. http://speechweb2.cs.uwindsor.ca/rawdata.zip

    Also heres my log

    INFO: cmd_ln.c(691): Parsing command line:
    
    
    Current configuration:
    [NAME]      [DEFLT]     [VALUE]
    -agc        none        none
    -agcthresh  2.0     2.000000e+00
    -alpha      0.97        9.700000e-01
    -ascale     20.0        2.000000e+01
    -aw     1       1
    -backtrace  no      no
    -beam       1e-48       1.000000e-48
    -bestpath   yes     yes
    -bestpathlw 9.5     9.500000e+00
    -bghist     no      no
    -ceplen     13      13
    -cmn        current     current
    -cmninit    8.0     8.0
    -compallsen no      no
    -debug              0
    -dict               
    -dictcase   no      no
    -dither     no      no
    -doublebw   no      no
    -ds     1       1
    -fdict              
    -feat       1s_c_d_dd   1s_c_d_dd
    -featparams         
    -fillprob   1e-8        1.000000e-08
    -frate      100     100
    -fsg                
    -fsgusealtpron  yes     yes
    -fsgusefiller   yes     yes
    -fwdflat    yes     yes
    -fwdflatbeam    1e-64       1.000000e-64
    -fwdflatefwid   4       4
    -fwdflatlw  8.5     8.500000e+00
    -fwdflatsfwin   25      25
    -fwdflatwbeam   7e-29       7.000000e-29
    -fwdtree    yes     yes
    -hmm                
    -input_endian   little      little
    -jsgf               
    -kdmaxbbi   -1      -1
    -kdmaxdepth 0       0
    -kdtree             
    -latsize    5000        5000
    -lda                
    -ldadim     0       0
    -lextreedump    0       0
    -lifter     0       0
    -lm             
    -lmctl              
    -lmname     default     default
    -logbase    1.0001      1.000100e+00
    -logfn              
    -logspec    no      no
    -lowerf     133.33334   1.333333e+02
    -lpbeam     1e-40       1.000000e-40
    -lponlybeam 7e-29       7.000000e-29
    -lw     6.5     6.500000e+00
    -maxhmmpf   -1      -1
    -maxnewoov  20      20
    -maxwpf     -1      -1
    -mdef               
    -mean               
    -mfclogdir          
    -min_endfr  0       0
    -mixw               
    -mixwfloor  0.0000001   1.000000e-07
    -mllr               
    -mmap       yes     yes
    -ncep       13      13
    -nfft       512     512
    -nfilt      40      40
    -nwpen      1.0     1.000000e+00
    -pbeam      1e-48       1.000000e-48
    -pip        1.0     1.000000e+00
    -pl_beam    1e-10       1.000000e-10
    -pl_pbeam   1e-5        1.000000e-05
    -pl_window  0       0
    -rawlogdir          
    -remove_dc  no      no
    -round_filters  yes     yes
    -samprate   16000       1.600000e+04
    -seed       -1      -1
    -sendump            
    -senlogdir          
    -senmgau            
    -silprob    0.005       5.000000e-03
    -smoothspec no      no
    -svspec             
    -tmat               
    -tmatfloor  0.0001      1.000000e-04
    -topn       4       4
    -topn_beam  0       0
    -toprule            
    -transform  legacy      legacy
    -unit_area  yes     yes
    -upperf     6855.4976   6.855498e+03
    -usewdphones    no      no
    -uw     1.0     1.000000e+00
    -var                
    -varfloor   0.0001      1.000000e-04
    -varnorm    no      no
    -verbose    no      no
    -warp_params            
    -warp_type  inverse_linear  inverse_linear
    -wbeam      7e-29       7.000000e-29
    -wip        0.65        6.500000e-01
    -wlen       0.025625    2.562500e-02
    
    INFO: cmd_ln.c(691): Parsing command line:
    \
        -nfilt 20 \
        -lowerf 1 \
        -upperf 4000 \
        -wlen 0.025 \
        -transform dct \
        -round_filters no \
        -remove_dc yes \
        -svspec 0-12/13-25/26-38 \
        -feat 1s_c_d_dd \
        -agc none \
        -cmn current \
        -cmninit 56,-3,1 \
        -varnorm no
    
    Current configuration:
    [NAME]      [DEFLT]     [VALUE]
    -agc        none        none
    -agcthresh  2.0     2.000000e+00
    -alpha      0.97        9.700000e-01
    -ceplen     13      13
    -cmn        current     current
    -cmninit    8.0     56,-3,1
    -dither     no      no
    -doublebw   no      no
    -feat       1s_c_d_dd   1s_c_d_dd
    -frate      100     100
    -input_endian   little      little
    -lda                
    -ldadim     0       0
    -lifter     0       0
    -logspec    no      no
    -lowerf     133.33334   1.000000e+00
    -ncep       13      13
    -nfft       512     512
    -nfilt      40      20
    -remove_dc  no      yes
    -round_filters  yes     no
    -samprate   16000       8.000000e+03
    -seed       -1      -1
    -smoothspec no      no
    -svspec             0-12/13-25/26-38
    -transform  legacy      dct
    -unit_area  yes     yes
    -upperf     6855.4976   4.000000e+03
    -varnorm    no      no
    -verbose    no      no
    -warp_params            
    -warp_type  inverse_linear  inverse_linear
    -wlen       0.025625    2.500000e-02
    
    INFO: acmod.c(242): Parsed model-specific feature parameters from /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/feat.params
    INFO: feat.c(684): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: acmod.c(163): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(520): Reading model definition: /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: bin_mdef.c(330): Reading binary model definition: /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/mdef
    INFO: bin_mdef.c(507): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
    INFO: tmat.c(205): Reading HMM transition probability matrices: /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/transition_matrices
    INFO: acmod.c(117): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: s2_semi_mgau.c(908): Loading senones from dump file /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/sendump
    INFO: s2_semi_mgau.c(932): BEGIN FILE FORMAT DESCRIPTION
    INFO: s2_semi_mgau.c(1027): Using memory-mapped I/O for senones
    INFO: s2_semi_mgau.c(1304): Maximum top-N: 4 Top-N beams: 0 0 0
    INFO: dict.c(306): Allocating 137127 * 20 bytes (2678 KiB) for word entries
    INFO: dict.c(321): Reading main dictionary: /mnt/sdcard/speechbrowser/temp.dict
    INFO: dict.c(212): Allocated 1004 KiB for strings, 1657 KiB for phones
    INFO: dict.c(324): 133021 words read
    INFO: dict.c(330): Reading filler dictionary: /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/noisedict
    INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(333): 11 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
    INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
    INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
    INFO: fsg_search.c(145): FSG(beam: -1080, pbeam: -1080, wbeam: -634; wip: -26, pip: 0)
    INFO: jsgf.c(546): Defined rule: PUBLIC <test.hi>
    INFO: fsg_model.c(213): Computing transitive closure for null transitions
    INFO: fsg_model.c(264): 0 null transitions added
    INFO: fsg_model.c(411): Adding silence transitions for <sil> to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++NOISE++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++BREATH++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++SMACK++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++COUGH++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++LAUGH++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++TONE++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++UH++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++UM++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_search.c(364): Added 13 alternate word transitions
    INFO: fsg_lextree.c(108): Allocated 5406 bytes (5 KiB) for left and right context phones
    INFO: fsg_lextree.c(251): 715 HMM nodes in lextree (586 leaves)
    INFO: fsg_lextree.c(253): Allocated 77220 bytes (75 KiB) for all lextree nodes
    INFO: fsg_lextree.c(256): Allocated 63288 bytes (61 KiB) for lextree leafnodes
    INFO: cmd_ln.c(691): Parsing command line:
    
    
    Current configuration:
    [NAME]      [DEFLT]     [VALUE]
    -agc        none        none
    -agcthresh  2.0     2.000000e+00
    -alpha      0.97        9.700000e-01
    -ascale     20.0        2.000000e+01
    -aw     1       1
    -backtrace  no      no
    -beam       1e-48       1.000000e-48
    -bestpath   yes     yes
    -bestpathlw 9.5     9.500000e+00
    -bghist     no      no
    -ceplen     13      13
    -cmn        current     current
    -cmninit    8.0     8.0
    -compallsen no      no
    -debug              0
    -dict               
    -dictcase   no      no
    -dither     no      no
    -doublebw   no      no
    -ds     1       1
    -fdict              
    -feat       1s_c_d_dd   1s_c_d_dd
    -featparams         
    -fillprob   1e-8        1.000000e-08
    -frate      100     100
    -fsg                
    -fsgusealtpron  yes     yes
    -fsgusefiller   yes     yes
    -fwdflat    yes     yes
    -fwdflatbeam    1e-64       1.000000e-64
    -fwdflatefwid   4       4
    -fwdflatlw  8.5     8.500000e+00
    -fwdflatsfwin   25      25
    -fwdflatwbeam   7e-29       7.000000e-29
    -fwdtree    yes     yes
    -hmm                
    -input_endian   little      little
    -jsgf               
    -kdmaxbbi   -1      -1
    -kdmaxdepth 0       0
    -kdtree             
    -latsize    5000        5000
    -lda                
    -ldadim     0       0
    -lextreedump    0       0
    -lifter     0       0
    -lm             
    -lmctl              
    -lmname     default     default
    -logbase    1.0001      1.000100e+00
    -logfn              
    -logspec    no      no
    -lowerf     133.33334   1.333333e+02
    -lpbeam     1e-40       1.000000e-40
    -lponlybeam 7e-29       7.000000e-29
    -lw     6.5     6.500000e+00
    -maxhmmpf   -1      -1
    -maxnewoov  20      20
    -maxwpf     -1      -1
    -mdef               
    -mean               
    -mfclogdir          
    -min_endfr  0       0
    -mixw               
    -mixwfloor  0.0000001   1.000000e-07
    -mllr               
    -mmap       yes     yes
    -ncep       13      13
    -nfft       512     512
    -nfilt      40      40
    -nwpen      1.0     1.000000e+00
    -pbeam      1e-48       1.000000e-48
    -pip        1.0     1.000000e+00
    -pl_beam    1e-10       1.000000e-10
    -pl_pbeam   1e-5        1.000000e-05
    -pl_window  0       0
    -rawlogdir          
    -remove_dc  no      no
    -round_filters  yes     yes
    -samprate   16000       1.600000e+04
    -seed       -1      -1
    -sendump            
    -senlogdir          
    -senmgau            
    -silprob    0.005       5.000000e-03
    -smoothspec no      no
    -svspec             
    -tmat               
    -tmatfloor  0.0001      1.000000e-04
    -topn       4       4
    -topn_beam  0       0
    -toprule            
    -transform  legacy      legacy
    -unit_area  yes     yes
    -upperf     6855.4976   6.855498e+03
    -usewdphones    no      no
    -uw     1.0     1.000000e+00
    -var                
    -varfloor   0.0001      1.000000e-04
    -varnorm    no      no
    -verbose    no      no
    -warp_params            
    -warp_type  inverse_linear  inverse_linear
    -wbeam      7e-29       7.000000e-29
    -wip        0.65        6.500000e-01
    -wlen       0.025625    2.562500e-02
    
    INFO: cmd_ln.c(691): Parsing command line:
    \
        -nfilt 20 \
        -lowerf 1 \
        -upperf 4000 \
        -wlen 0.025 \
        -transform dct \
        -round_filters no \
        -remove_dc yes \
        -svspec 0-12/13-25/26-38 \
        -feat 1s_c_d_dd \
        -agc none \
        -cmn current \
        -cmninit 56,-3,1 \
        -varnorm no
    
    Current configuration:
    [NAME]      [DEFLT]     [VALUE]
    -agc        none        none
    -agcthresh  2.0     2.000000e+00
    -alpha      0.97        9.700000e-01
    -ceplen     13      13
    -cmn        current     current
    -cmninit    8.0     56,-3,1
    -dither     no      no
    -doublebw   no      no
    -feat       1s_c_d_dd   1s_c_d_dd
    -frate      100     100
    -input_endian   little      little
    -lda                
    -ldadim     0       0
    -lifter     0       0
    -logspec    no      no
    -lowerf     133.33334   1.000000e+00
    -ncep       13      13
    -nfft       512     512
    -nfilt      40      20
    -remove_dc  no      yes
    -round_filters  yes     no
    -samprate   16000       8.000000e+03
    -seed       -1      -1
    -smoothspec no      no
    -svspec             0-12/13-25/26-38
    -transform  legacy      dct
    -unit_area  yes     yes
    -upperf     6855.4976   4.000000e+03
    -varnorm    no      no
    -verbose    no      no
    -warp_params            
    -warp_type  inverse_linear  inverse_linear
    -wlen       0.025625    2.500000e-02
    
    INFO: acmod.c(242): Parsed model-specific feature parameters from /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/feat.params
    INFO: feat.c(684): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: acmod.c(163): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(520): Reading model definition: /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: bin_mdef.c(330): Reading binary model definition: /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/mdef
    INFO: bin_mdef.c(507): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
    INFO: tmat.c(205): Reading HMM transition probability matrices: /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/transition_matrices
    INFO: acmod.c(117): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: s2_semi_mgau.c(908): Loading senones from dump file /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/sendump
    INFO: s2_semi_mgau.c(932): BEGIN FILE FORMAT DESCRIPTION
    INFO: s2_semi_mgau.c(1027): Using memory-mapped I/O for senones
    INFO: s2_semi_mgau.c(1304): Maximum top-N: 4 Top-N beams: 0 0 0
    INFO: dict.c(306): Allocating 137127 * 20 bytes (2678 KiB) for word entries
    INFO: dict.c(321): Reading main dictionary: /mnt/sdcard/speechbrowser/temp.dict
    INFO: dict.c(212): Allocated 1004 KiB for strings, 1657 KiB for phones
    INFO: dict.c(324): 133021 words read
    INFO: dict.c(330): Reading filler dictionary: /mnt/sdcard/speechbrowser/speech/hmm/en_US/hub4wsj_sc_8k/noisedict
    INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(333): 11 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
    INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
    INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
    INFO: fsg_search.c(145): FSG(beam: -1080, pbeam: -1080, wbeam: -634; wip: -26, pip: 0)
    INFO: jsgf.c(546): Defined rule: PUBLIC <test.hi>
    INFO: fsg_model.c(213): Computing transitive closure for null transitions
    INFO: fsg_model.c(264): 0 null transitions added
    INFO: fsg_model.c(411): Adding silence transitions for <sil> to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++NOISE++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++BREATH++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++SMACK++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++COUGH++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++LAUGH++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++TONE++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++UH++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++UM++ to FSG
    INFO: fsg_model.c(431): Added 53 silence word transitions
    INFO: fsg_search.c(364): Added 13 alternate word transitions
    INFO: fsg_lextree.c(108): Allocated 5406 bytes (5 KiB) for left and right context phones
    INFO: fsg_lextree.c(251): 715 HMM nodes in lextree (586 leaves)
    INFO: fsg_lextree.c(253): Allocated 77220 bytes (75 KiB) for all lextree nodes
    INFO: fsg_lextree.c(256): Allocated 63288 bytes (61 KiB) for lextree leafnodes
    

    Thanks

     
  • Nickolay V. Shmyrev

    The audio has zero energy regions. You need to add "-dither yes" to engine
    configuration.

     
  • Jon

    Jon - 2012-08-08

    That got it! Thank you, this problem has been dogging me for a while, and I
    couldn't for the life of me figure it out.

    Is there some documentation that you could point me to which elaborates on the
    -dither option (and other command line options for that matter)?

    Once again, thank you.

     
  • Nickolay V. Shmyrev

    Is there some documentation that you could point me to which elaborates on
    the -dither option (and other command line options for that matter)?

    man pocketsphinx_batch

     

Log in to post a comment.