Menu

Pocket sphinx accuracy & parameters ...

Help
2012-06-11
2012-09-22
  • Michael Liguori

    Michael Liguori - 2012-06-11

    All,

    I was wondering how many hours of voice recordings have been used to create
    the hmm/en_US/hub4wsj_sc_8k acoustic library. Do you have an estimate of how
    many hours?

    The current accuracy level of the modified sample program is poor and I'm
    trying to figure out how to improve the overall accuracy. I have been reading
    up on this portion of the tutorial.
    http://cmusphinx.sourceforge.net/wiki/tutorialadapt

    In other voice recognizers I can modify the beam length and other parameters.
    Is there something that can tell me how to modify those parameters?

    When the sentence is too long it seems to cause pocketsphinx to fail.
    Error:

    INFO: fsg_lextree.c(251): 13665 HMM nodes in lextree (12274 leaves)
    INFO: fsg_lextree.c(253): Allocated 1475820 bytes (1441 KiB) for all lextree nodes
    INFO: fsg_lextree.c(256): Allocated 1325592 bytes (1294 KiB) for lextree leafnodes
    INFO: cmn.c(175): CMN: 50.49 -0.36  2.14  0.47 -0.49 -1.05 -0.02 -0.40 -0.23  0.16  0.22 -0.13  0.36 
    INFO: fsg_search.c(1030): 633 frames, 56007 HMMs (88/fr), 61097 senones (96/fr), 52530 history entries (82/fr)
    
    INFO: fsg_search.c(1407): Start node <sil>.0:2:131
    INFO: fsg_search.c(1446): End node <sil>.527:540:632 (-1611)
    INFO: fsg_search.c(1662): lattice start node <sil>.0 end node <sil>.527
    INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(<sil>:527:632) = -536870912
    WARNING: "fsg_search.c", line 1155: Failed to bestpath in a lattice
    

    Regards,

    Mike

     
  • Pranav Jawale

    Pranav Jawale - 2012-06-12

    Is there something that can tell me how to modify those parameters?

    http://www.cs.cmu.edu/~archan/s_info/Sphinx3/doc/s3_description.html#sec_exec

    WARNING: "fsg_search.c", line 1155: Failed to bestpath in a lattice

    633 frames ain't too long. Anyway, that's just a warning, not an error. Is
    that your complete log?

     
  • Michael Liguori

    Michael Liguori - 2012-06-12

    Here is the full output, with gdb. The code provides output for a smaller
    sentence put not the one I used. There doesn't seem to be a failure message.

    /opt/WRM4/sphinx/code$ gdb hello_ps 
    GNU gdb (GDB) 7.2-ubuntu
    Copyright (C) 2010 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <[url]http://gnu.org/licenses/gpl.html[/url]>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "i686-linux-gnu".
    For bug reporting instructions, please see:
    <[url]http://www.gnu.org/software/gdb/bugs/[/url]>...
    Reading symbols from /opt/WRM4/sphinx/code/hello_ps...(no debugging symbols found)...done.
    (gdb) run
    Starting program: /opt/WRM4/sphinx/code/hello_ps 
    [Thread debugging using libthread_db enabled]
    INFO: cmd_ln.c(691): Parsing command line:
    \
        -hmm /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k \
        -dict cmu07a.dic \
        -jsgf WRM4_NLP.grammar.jsgf
    
    Current configuration:
    [NAME]      [DEFLT]     [VALUE]
    -agc        none        none
    -agcthresh  2.0     2.000000e+00
    -alpha      0.97        9.700000e-01
    -ascale     20.0        2.000000e+01
    -aw     1       1
    -backtrace  no      no
    -beam       1e-48       1.000000e-48
    -bestpath   yes     yes
    -bestpathlw 9.5     9.500000e+00
    -bghist     no      no
    -ceplen     13      13
    -cmn        current     current
    -cmninit    8.0     8.0
    -compallsen no      no
    -debug              0
    -dict               cmu07a.dic
    -dictcase   no      no
    -dither     no      no
    -doublebw   no      no
    -ds     1       1
    -fdict              
    -feat       1s_c_d_dd   1s_c_d_dd
    -featparams         
    -fillprob   1e-8        1.000000e-08
    -frate      100     100
    -fsg                
    -fsgusealtpron  yes     yes
    -fsgusefiller   yes     yes
    -fwdflat    yes     yes
    -fwdflatbeam    1e-64       1.000000e-64
    -fwdflatefwid   4       4
    -fwdflatlw  8.5     8.500000e+00
    -fwdflatsfwin   25      25
    -fwdflatwbeam   7e-29       7.000000e-29
    -fwdtree    yes     yes
    -hmm                /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k
    -input_endian   little      little
    -jsgf               WRM4_NLP.grammar.jsgf
    -kdmaxbbi   -1      -1
    -kdmaxdepth 0       0
    -kdtree             
    -latsize    5000        5000
    -lda                
    -ldadim     0       0
    -lextreedump    0       0
    -lifter     0       0
    -lm             
    -lmctl              
    -lmname     default     default
    -logbase    1.0001      1.000100e+00
    -logfn              
    -logspec    no      no
    -lowerf     133.33334   1.333333e+02
    -lpbeam     1e-40       1.000000e-40
    -lponlybeam 7e-29       7.000000e-29
    -lw     6.5     6.500000e+00
    -maxhmmpf   -1      -1
    -maxnewoov  20      20
    -maxwpf     -1      -1
    -mdef               
    -mean               
    -mfclogdir          
    -min_endfr  0       0
    -mixw               
    -mixwfloor  0.0000001   1.000000e-07
    -mllr               
    -mmap       yes     yes
    -ncep       13      13
    -nfft       512     512
    -nfilt      40      40
    -nwpen      1.0     1.000000e+00
    -pbeam      1e-48       1.000000e-48
    -pip        1.0     1.000000e+00
    -pl_beam    1e-10       1.000000e-10
    -pl_pbeam   1e-5        1.000000e-05
    -pl_window  0       0
    -rawlogdir          
    -remove_dc  no      no
    -round_filters  yes     yes
    -samprate   16000       1.600000e+04
    -seed       -1      -1
    -sendump            
    -senlogdir          
    -senmgau            
    -silprob    0.005       5.000000e-03
    -smoothspec no      no
    -svspec             
    -tmat               
    -tmatfloor  0.0001      1.000000e-04
    -topn       4       4
    -topn_beam  0       0
    -toprule            
    -transform  legacy      legacy
    -unit_area  yes     yes
    -upperf     6855.4976   6.855498e+03
    -usewdphones    no      no
    -uw     1.0     1.000000e+00
    -var                
    -varfloor   0.0001      1.000000e-04
    -varnorm    no      no
    -verbose    no      no
    -warp_params            
    -warp_type  inverse_linear  inverse_linear
    -wbeam      7e-29       7.000000e-29
    -wip        0.65        6.500000e-01
    -wlen       0.025625    2.562500e-02
    
    INFO: cmd_ln.c(691): Parsing command line:
    \
        -nfilt 20 \
        -lowerf 1 \
        -upperf 4000 \
        -wlen 0.025 \
        -transform dct \
        -round_filters no \
        -remove_dc yes \
        -svspec 0-12/13-25/26-38 \
        -feat 1s_c_d_dd \
        -agc none \
        -cmn current \
        -cmninit 56,-3,1 \
        -varnorm no
    
    Current configuration:
    [NAME]      [DEFLT]     [VALUE]
    -agc        none        none
    -agcthresh  2.0     2.000000e+00
    -alpha      0.97        9.700000e-01
    -ceplen     13      13
    -cmn        current     current
    -cmninit    8.0     56,-3,1
    -dither     no      no
    -doublebw   no      no
    -feat       1s_c_d_dd   1s_c_d_dd
    -frate      100     100
    -input_endian   little      little
    -lda                
    -ldadim     0       0
    -lifter     0       0
    -logspec    no      no
    -lowerf     133.33334   1.000000e+00
    -ncep       13      13
    -nfft       512     512
    -nfilt      40      20
    -remove_dc  no      yes
    -round_filters  yes     no
    -samprate   16000       1.600000e+04
    -seed       -1      -1
    -smoothspec no      no
    -svspec             0-12/13-25/26-38
    -transform  legacy      dct
    -unit_area  yes     yes
    -upperf     6855.4976   4.000000e+03
    -varnorm    no      no
    -verbose    no      no
    -warp_params            
    -warp_type  inverse_linear  inverse_linear
    -wlen       0.025625    2.500000e-02
    
    INFO: acmod.c(242): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
    INFO: feat.c(684): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: acmod.c(163): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(520): Reading model definition: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
    INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: bin_mdef.c(330): Reading binary model definition: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
    INFO: bin_mdef.c(507): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
    INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
    INFO: acmod.c(117): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
    INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(294):  256x13
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: s2_semi_mgau.c(908): Loading senones from dump file /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
    INFO: s2_semi_mgau.c(932): BEGIN FILE FORMAT DESCRIPTION
    INFO: s2_semi_mgau.c(1027): Using memory-mapped I/O for senones
    INFO: s2_semi_mgau.c(1304): Maximum top-N: 4 Top-N beams: 0 0 0
    INFO: dict.c(306): Allocating 137543 * 20 bytes (2686 KiB) for word entries
    INFO: dict.c(321): Reading main dictionary: cmu07a.dic
    INFO: dict.c(212): Allocated 1010 KiB for strings, 1664 KiB for phones
    INFO: dict.c(324): 133437 words read
    INFO: dict.c(330): Reading filler dictionary: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
    INFO: dict.c(212): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(333): 11 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
    INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
    INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
    INFO: fsg_search.c(145): FSG(beam: -1080, pbeam: -1080, wbeam: -634; wip: -26, pip: 0)
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00000>
    INFO: jsgf.c(546): Defined rule: PUBLIC <SENTENCE.SENTENCE>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00002>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.DEVICE_TYPE>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00004>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00005>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00006>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00007>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.S5>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.VERB_LOOP>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.PREP_LOOP>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00011>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00012>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00013>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00014>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.NOUN_LOOP>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00016>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00017>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00018>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00019>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00020>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00021>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.g00022>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.CN_LOOP>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.NOUN>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.VERB>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.ADV>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.PRONOUN>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.ADJECTIVE>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.PREP>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.TO>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.CN>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.UNITS>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.DT>
    INFO: jsgf.c(546): Defined rule: <SENTENCE.PDT>
    INFO: fsg_model.c(213): Computing transitive closure for null transitions
    INFO: fsg_model.c(264): 3693 null transitions added
    INFO: fsg_model.c(411): Adding silence transitions for <sil> to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++NOISE++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++BREATH++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++SMACK++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++COUGH++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++LAUGH++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++TONE++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++UH++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++UM++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_search.c(364): Added 132 alternate word transitions
    INFO: fsg_lextree.c(108): Allocated 55998 bytes (54 KiB) for left and right context phones
    INFO: fsg_lextree.c(251): 13724 HMM nodes in lextree (12333 leaves)
    INFO: fsg_lextree.c(253): Allocated 1482192 bytes (1447 KiB) for all lextree nodes
    INFO: fsg_lextree.c(256): Allocated 1331964 bytes (1300 KiB) for lextree leafnodes
    INFO: cmn.c(175): CMN: 50.49 -0.36  2.14  0.47 -0.49 -1.05 -0.02 -0.40 -0.23  0.16  0.22 -0.13  0.36 
    INFO: fsg_search.c(1030): 633 frames, 56045 HMMs (88/fr), 61097 senones (96/fr), 52530 history entries (82/fr)
    
    INFO: fsg_search.c(1407): Start node <sil>.0:2:131
    INFO: fsg_search.c(1446): End node <sil>.527:540:632 (-1611)
    INFO: fsg_search.c(1662): lattice start node <sil>.0 end node <sil>.527
    INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(<sil>:527:632) = -536870912
    WARNING: "fsg_search.c", line 1155: Failed to bestpath in a lattice
    
    Program exited with code 01.
    (gdb)
    

    Other sentence output:

    NFO: jsgf.c(546): Defined rule: <SENTENCE.PDT>
    INFO: fsg_model.c(213): Computing transitive closure for null transitions
    INFO: fsg_model.c(264): 3693 null transitions added
    INFO: fsg_model.c(411): Adding silence transitions for <sil> to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++NOISE++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++BREATH++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++SMACK++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++COUGH++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++LAUGH++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++TONE++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++UH++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_model.c(411): Adding silence transitions for ++UM++ to FSG
    INFO: fsg_model.c(431): Added 549 silence word transitions
    INFO: fsg_search.c(364): Added 132 alternate word transitions
    INFO: fsg_lextree.c(108): Allocated 55998 bytes (54 KiB) for left and right context phones
    INFO: fsg_lextree.c(251): 13665 HMM nodes in lextree (12274 leaves)
    INFO: fsg_lextree.c(253): Allocated 1475820 bytes (1441 KiB) for all lextree nodes
    INFO: fsg_lextree.c(256): Allocated 1325592 bytes (1294 KiB) for lextree leafnodes
    INFO: cmn.c(175): CMN: 49.40  1.37  1.65 -0.79 -0.03 -0.09  0.31  0.16  0.12 -0.07 -0.23 -0.06  0.11 
    INFO: fsg_search.c(1030): 422 frames, 82821 HMMs (196/fr), 110559 senones (261/fr), 49694 history entries (117/fr)
    
    INFO: fsg_search.c(1407): Start node <sil>.0:2:102
    INFO: fsg_search.c(1446): End node <sil>.303:323:421 (-2516)
    INFO: fsg_search.c(1446): End node <sil>.310:321:421 (-1606)
    INFO: fsg_search.c(1446): End node <sil>.305:307:421 (-790)
    INFO: fsg_search.c(1446): End node <sil>.304:306:421 (-775)
    INFO: fsg_search.c(1446): End node <sil>.302:304:421 (-796)
    INFO: fsg_search.c(1446): End node <sil>.301:303:421 (-795)
    INFO: fsg_search.c(1662): lattice start node <sil>.0 end node </s>.422
    INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:422:422) = -1585502
    INFO: ps_lattice.c(1390): Joint P(O,S) = -1615818 P(S|O) = -30316
    Recognized: mister light from the kitchen later go off
    INFO: cmn_prior.c(121): cmn_prior_update: from < 49.40  1.37  1.65 -0.79 -0.03 -0.09  0.31  0.16  0.12 -0.07 -0.23 -0.06  0.11 >
    INFO: cmn_prior.c(139): cmn_prior_update: to   < 49.40  1.37  1.65 -0.79 -0.03 -0.09  0.31  0.16  0.12 -0.07 -0.23 -0.06  0.11 >
    INFO: fsg_search.c(1030): 422 frames, 82821 HMMs (196/fr), 110559 senones (261/fr), 49694 history entries (117/fr)
    
    INFO: fsg_search.c(1407): Start node <sil>.0:2:102
    INFO: fsg_search.c(1446): End node <sil>.303:323:421 (-2516)
    INFO: fsg_search.c(1446): End node <sil>.310:321:421 (-1606)
    INFO: fsg_search.c(1446): End node <sil>.305:307:421 (-790)
    INFO: fsg_search.c(1446): End node <sil>.304:306:421 (-775)
    INFO: fsg_search.c(1446): End node <sil>.302:304:421 (-796)
    INFO: fsg_search.c(1446): End node <sil>.301:303:421 (-795)
    INFO: fsg_search.c(1662): lattice start node <sil>.0 end node </s>.422
    INFO: ps_lattice.c(1352): Normalizer P(O) = alpha(</s>:422:422) = -1585502
    INFO: ps_lattice.c(1390): Joint P(O,S) = -1615818 P(S|O) = -30316
    Recognized: mister light from the kitchen later go off
    
     
  • Pranav Jawale

    Pranav Jawale - 2012-06-12

    Sorry, this is just a guess, but is your sentence according to grammar?

    Secondly, can you add some silence to second file 'mister light from ..' to
    make it as long as first one and see if the problem is duplicated?

     

Log in to post a comment.