Menu

using acoustic model with pocketsphinx

Help
med
2012-04-07
2012-09-22
  • med

    med - 2012-04-07

    Hi,
    I just finished training acoustic model using my data, those are the ruselts:

    Insertions: 0 Deletions: 1 Substitutions: 0
    TOTAL Words: 100 Correct: 34 Errors: 66
    TOTAL Percent correct = 34.00% Error = 66.00% Accuracy = 34.00%
    TOTAL Insertions: 0 Deletions: 61 Substitutions: 5

    When I try to try it with pocketsphinx this is what I get:

    /beta# pocketsphinx_continuous -hmm model_parameters/beta.cd_cont_200 -lm amdigits.lm -dict etc/beta.dic
    INFO: cmd_ln.c(691): Parsing command line:
    pocketsphinx_continuous \
        -hmm model_parameters/beta.cd_cont_200 \
        -lm amdigits.lm \
        -dict etc/beta.dic
    
    Current configuration:
    [NAME]      [DEFLT]     [VALUE]
    -adcdev             
    -agc        none        none
    -agcthresh  2.0     2.000000e+00
    -alpha      0.97        9.700000e-01
    -argfile            
    -ascale     20.0        2.000000e+01
    -aw     1       1
    -backtrace  no      no
    -beam       1e-48       1.000000e-48
    -bestpath   yes     yes
    -bestpathlw 9.5     9.500000e+00
    -bghist     no      no
    -ceplen     13      13
    -cmn        current     current
    -cmninit    8.0     8.0
    -compallsen no      no
    -debug              0
    -dict               etc/beta.dic
    -dictcase   no      no
    -dither     no      no
    -doublebw   no      no
    -ds     1       1
    -fdict              
    -feat       1s_c_d_dd   1s_c_d_dd
    -featparams         
    -fillprob   1e-8        1.000000e-08
    -frate      100     100
    -fsg                
    -fsgusealtpron  yes     yes
    -fsgusefiller   yes     yes
    -fwdflat    yes     yes
    -fwdflatbeam    1e-64       1.000000e-64
    -fwdflatefwid   4       4
    -fwdflatlw  8.5     8.500000e+00
    -fwdflatsfwin   25      25
    -fwdflatwbeam   7e-29       7.000000e-29
    -fwdtree    yes     yes
    -hmm                model_parameters/beta.cd_cont_200
    -infile             
    -input_endian   little      little
    -jsgf               
    -kdmaxbbi   -1      -1
    -kdmaxdepth 0       0
    -kdtree             
    -latsize    5000        5000
    -lda                
    -ldadim     0       0
    -lextreedump    0       0
    -lifter     0       0
    -lm             amdigits.lm
    -lmctl              
    -lmname     default     default
    -logbase    1.0001      1.000100e+00
    -logfn              
    -logspec    no      no
    -lowerf     133.33334   1.333333e+02
    -lpbeam     1e-40       1.000000e-40
    -lponlybeam 7e-29       7.000000e-29
    -lw     6.5     6.500000e+00
    -maxhmmpf   -1      -1
    -maxnewoov  20      20
    -maxwpf     -1      -1
    -mdef               
    -mean               
    -mfclogdir          
    -min_endfr  0       0
    -mixw               
    -mixwfloor  0.0000001   1.000000e-07
    -mllr               
    -mmap       yes     yes
    -ncep       13      13
    -nfft       512     512
    -nfilt      40      40
    -nwpen      1.0     1.000000e+00
    -pbeam      1e-48       1.000000e-48
    -pip        1.0     1.000000e+00
    -pl_beam    1e-10       1.000000e-10
    -pl_pbeam   1e-5        1.000000e-05
    -pl_window  0       0
    -rawlogdir          
    -remove_dc  no      no
    -round_filters  yes     yes
    -samprate   16000       1.600000e+04
    -seed       -1      -1
    -sendump            
    -senlogdir          
    -senmgau            
    -silprob    0.005       5.000000e-03
    -smoothspec no      no
    -svspec             
    -time       no      no
    -tmat               
    -tmatfloor  0.0001      1.000000e-04
    -topn       4       4
    -topn_beam  0       0
    -toprule            
    -transform  legacy      legacy
    -unit_area  yes     yes
    -upperf     6855.4976   6.855498e+03
    -usewdphones    no      no
    -uw     1.0     1.000000e+00
    -var                
    -varfloor   0.0001      1.000000e-04
    -varnorm    no      no
    -verbose    no      no
    -warp_params            
    -warp_type  inverse_linear  inverse_linear
    -wbeam      7e-29       7.000000e-29
    -wip        0.65        6.500000e-01
    -wlen       0.025625    2.562500e-02
    
    INFO: cmd_ln.c(691): Parsing command line:
    \
        -nfilt 40 \
        -lowerf 133.3334 \
        -upperf 6855.4976 \
        -feat 1s_c_d_dd \
        -agc none \
        -cmn current \
        -varnorm no
    
    Current configuration:
    [NAME]      [DEFLT]     [VALUE]
    -agc        none        none
    -agcthresh  2.0     2.000000e+00
    -alpha      0.97        9.700000e-01
    -ceplen     13      13
    -cmn        current     current
    -cmninit    8.0     8.0
    -dither     no      no
    -doublebw   no      no
    -feat       1s_c_d_dd   1s_c_d_dd
    -frate      100     100
    -input_endian   little      little
    -lda                
    -ldadim     0       0
    -lifter     0       0
    -logspec    no      no
    -lowerf     133.33334   1.333334e+02
    -ncep       13      13
    -nfft       512     512
    -nfilt      40      40
    -remove_dc  no      no
    -round_filters  yes     yes
    -samprate   16000       1.600000e+04
    -seed       -1      -1
    -smoothspec no      no
    -svspec             
    -transform  legacy      legacy
    -unit_area  yes     yes
    -upperf     6855.4976   6.855498e+03
    -varnorm    no      no
    -verbose    no      no
    -warp_params            
    -warp_type  inverse_linear  inverse_linear
    -wlen       0.025625    2.562500e-02
    
    INFO: acmod.c(246): Parsed model-specific feature parameters from model_parameters/beta.cd_cont_200/feat.params
    INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: mdef.c(517): Reading model definition: model_parameters/beta.cd_cont_200/mdef
    INFO: bin_mdef.c(179): Allocating 305 * 8 bytes (2 KiB) for CD tree
    INFO: tmat.c(205): Reading HMM transition probability matrices: model_parameters/beta.cd_cont_200/transition_matrices
    INFO: acmod.c(121): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: model_parameters/beta.cd_cont_200/means
    INFO: ms_gauden.c(292): 150 codebook, 1 feature, size: 
    INFO: ms_gauden.c(294):  8x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: model_parameters/beta.cd_cont_200/variances
    INFO: ms_gauden.c(292): 150 codebook, 1 feature, size: 
    INFO: ms_gauden.c(294):  8x39
    INFO: ms_gauden.c(354): 5 variance values floored
    INFO: acmod.c(123): Attempting to use PTHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: model_parameters/beta.cd_cont_200/means
    INFO: ms_gauden.c(292): 150 codebook, 1 feature, size: 
    INFO: ms_gauden.c(294):  8x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: model_parameters/beta.cd_cont_200/variances
    INFO: ms_gauden.c(292): 150 codebook, 1 feature, size: 
    INFO: ms_gauden.c(294):  8x39
    INFO: ms_gauden.c(354): 5 variance values floored
    INFO: ptm_mgau.c(804): Number of codebooks doesn't match number of ciphones, doesn't look like PTM: 150 != 17
    INFO: acmod.c(125): Falling back to general multi-stream GMM computation
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: model_parameters/beta.cd_cont_200/means
    INFO: ms_gauden.c(292): 150 codebook, 1 feature, size: 
    INFO: ms_gauden.c(294):  8x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: model_parameters/beta.cd_cont_200/variances
    INFO: ms_gauden.c(292): 150 codebook, 1 feature, size: 
    INFO: ms_gauden.c(294):  8x39
    INFO: ms_gauden.c(354): 5 variance values floored
    INFO: ms_senone.c(160): Reading senone mixture weights: model_parameters/beta.cd_cont_200/mixture_weights
    INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
    INFO: ms_senone.c(218): Not transposing mixture weights in memory
    INFO: ms_senone.c(277): Read mixture weights for 150 senones: 1 features x 8 codewords
    INFO: ms_senone.c(331): Mapping senones to individual codebooks
    INFO: ms_mgau.c(141): The value of topn: 4
    INFO: dict.c(317): Allocating 4110 * 20 bytes (80 KiB) for word entries
    INFO: dict.c(332): Reading main dictionary: etc/beta.dic
    INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(335): 11 words read
    INFO: dict.c(341): Reading filler dictionary: model_parameters/beta.cd_cont_200/noisedict
    INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(344): 3 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(404): Allocating 17^3 * 2 bytes (9 KiB) for word-initial triphones
    INFO: dict2pid.c(131): Allocated 3536 bytes (3 KiB) for word-final triphones
    INFO: dict2pid.c(195): Allocated 3536 bytes (3 KiB) for single-phone word triphones
    INFO: ngram_model_arpa.c(477): ngrams 1=12, 2=20, 3=10
    INFO: ngram_model_arpa.c(135): Reading unigrams
    INFO: ngram_model_arpa.c(516):       12 = #unigrams created
    INFO: ngram_model_arpa.c(195): Reading bigrams
    INFO: ngram_model_arpa.c(533):       20 = #bigrams created
    INFO: ngram_model_arpa.c(534):        3 = #prob2 entries
    INFO: ngram_model_arpa.c(542):        3 = #bo_wt2 entries
    INFO: ngram_model_arpa.c(292): Reading trigrams
    INFO: ngram_model_arpa.c(555):       10 = #trigrams created
    INFO: ngram_model_arpa.c(556):        2 = #prob3 entries
    INFO: ngram_search_fwdtree.c(99): 11 unique initial diphones
    INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 4 single-phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 4 single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 142
    INFO: ngram_search_fwdtree.c(338): after: 11 root, 14 non-root channels, 3 single-phone words
    INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: continuous.c(371): pocketsphinx_continuous COMPILED ON: Feb 28 2012, AT: 10:20:20
    
    Warning: Could not find Mic element
    READY....
    Listening...
    Recording is stopped, start recording with ad_start_rec
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from <  8.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
    INFO: cmn_prior.c(139): cmn_prior_update: to   < 12.58 -1.37 -0.26 -0.32 -0.34 -0.09 -0.12 -0.23 -0.02 -0.05 -0.21 -0.01 -0.17 >
    INFO: ngram_search_fwdtree.c(1549):      582 words recognized (2/fr)
    INFO: ngram_search_fwdtree.c(1551):     8751 senones evaluated (36/fr)
    INFO: ngram_search_fwdtree.c(1553):     3514 channels searched (14/fr), 2651 1st, 589 last
    INFO: ngram_search_fwdtree.c(1557):      589 words for which last channels evaluated (2/fr)
    INFO: ngram_search_fwdtree.c(1560):        0 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1562): fwdtree 0.04 CPU 0.018 xRT
    INFO: ngram_search_fwdtree.c(1565): fwdtree 2.82 wall 1.150 xRT
    INFO: ngram_search_fwdflat.c(305): Utterance vocabulary contains 1 words
    INFO: ngram_search_fwdflat.c(940):      283 words recognized (1/fr)
    INFO: ngram_search_fwdflat.c(942):      732 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(944):      411 channels searched (1/fr)
    INFO: ngram_search_fwdflat.c(946):      411 words searched (1/fr)
    INFO: ngram_search_fwdflat.c(948):       26 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(951): fwdflat 0.00 CPU 0.002 xRT
    INFO: ngram_search_fwdflat.c(954): fwdflat 0.00 wall 0.001 xRT
    INFO: ngram_search.c(1214): </s> not found in last frame, using <sil>.243 instead
    INFO: ngram_search.c(1266): lattice start node <s>.0 end node <sil>.236
    INFO: ngram_search.c(1294): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1399): Lattice has 17 nodes, 20 links
    INFO: ps_lattice.c(1365): Normalizer P(O) = alpha(<sil>:236:243) = -193566
    INFO: ps_lattice.c(1403): Joint P(O,S) = -194077 P(S|O) = -511
    INFO: ngram_search.c(888): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(891): bestpath 0.00 wall 0.000 xRT
    000000000: 
    READY....
    

    And it keeps runing this without recognizing any word :s :s

    did I miss something?!!!

    Please answer me this times

     
  • Nickolay V. Shmyrev

    did I miss something?!!!

    Most likely input audio had wrong format and features were not extracted
    correctly. See tutorial for details

    http://cmusphinx.sourceforge.net/wiki/tutorialam

     

Log in to post a comment.