Menu

Pocketsphinx_continuous cmd output does not give recognized words

Help
jam
2015-06-28
2015-06-30
  • jam

    jam - 2015-06-28

    Hi,

    I recently trained an acoustic model for a small subset of the arabic language (1300 words approximately).

    The training passed in CONTINUOUS and PTM modes and I am now manually testing with the pocketsphinx_continuous command to check if the results are OK.

    The problem is that I first tried with the CONTINUOUS model and I have no recognized words from the output. Only infos that I cannot clearly understand.

    I searched for same problems in the forum and it was mostly due to a wrong audio file format but I checked my audio file and it is the right format (16 khz, 16 bit signed integer PCM).

    Could you please have a look and tell me if there is a problem?

    Thanks a lot for your help,

    Here is the command I ran:

    pocketsphinx_continuous -hmm arabic.cd_cont_3000/ -lm arabic.lm.DMP -dict arabic.dic -infile 079013.wav -samprate 16000

    Here is the result:

    INFO: cmd_ln.c(697): Parsing command line:
    pocketsphinx_continuous \
    -hmm arabic.cd_cont_3000/ \
    -lm arabic.lm.DMP \
    -dict arabic.dic \
    -infile 079013.wav \
    -samprate 16000
    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -adcdev
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -allphone
    -allphone_ci no no
    -alpha 0.97 9.700000e-01
    -argfile
    -ascale 20.0 2.000000e+01
    -aw 1 1
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -debug 0
    -dict arabic.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm arabic.cd_cont_3000/
    -infile 079013.wav
    -inmic no no
    -input_endian little little
    -jsgf
    -keyphrase
    -kws
    -kws_plp 1e-1 1.000000e-01
    -kws_threshold 1 1.000000e+00
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lifter 0 0
    -lm arabic.lm.DMP
    -lmctl
    -lmname
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf 30000 30000
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -min_endfr 0 0
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mmap yes yes
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-10 1.000000e-10
    -pl_pip 1.0 1.000000e+00
    -pl_weight 3.0 3.000000e+00
    -pl_window 5 5
    -rawlogdir
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senlogdir
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -time no no
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -uw 1.0 1.000000e+00
    -vad_postspeech 50 50
    -vad_prespeech 10 10
    -vad_threshold 2.0 2.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02
    INFO: cmd_ln.c(697): Parsing command line:
    \
    -lowerf 130 \
    -upperf 6800 \
    -nfilt 25 \
    -transform dct \
    -lifter 22 \
    -feat 1s_c_d_dd \
    -agc none \
    -cmn current \
    -varnorm no
    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -dither no no
    -doublebw no no
    -feat 1s_c_d_dd 1s_c_d_dd
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 22
    -logspec no no
    -lowerf 133.33334 1.300000e+02
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 25
    -remove_dc no no
    -remove_noise yes yes
    -remove_silence yes yes
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -smoothspec no no
    -svspec
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 6.800000e+03
    -vad_postspeech 50 50
    -vad_prespeech 10 10
    -vad_threshold 2.0 2.000000e+00
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.562500e-02
    INFO: acmod.c(252): Parsed model-specific feature parameters from arabic.cd_cont_3000//feat.params
    INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
    INFO: mdef.c(518): Reading model definition: arabic.cd_cont_3000//mdef
    INFO: bin_mdef.c(181): Allocating 11854 * 8 bytes (92 KiB) for CD tree
    INFO: tmat.c(206): Reading HMM transition probability matrices: arabic.cd_cont_3000//transition_matrices
    INFO: acmod.c(124): Attempting to use PTM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: arabic.cd_cont_3000//means
    INFO: ms_gauden.c(292): 3120 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 8x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: arabic.cd_cont_3000//variances
    INFO: ms_gauden.c(292): 3120 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 8x39
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: ptm_mgau.c(801): Number of codebooks exceeds 256: 3120
    INFO: acmod.c(126): Attempting to use semi-continuous computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: arabic.cd_cont_3000//means
    INFO: ms_gauden.c(292): 3120 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 8x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: arabic.cd_cont_3000//variances
    INFO: ms_gauden.c(292): 3120 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 8x39
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: acmod.c(128): Falling back to general multi-stream GMM computation
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: arabic.cd_cont_3000//means
    INFO: ms_gauden.c(292): 3120 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 8x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: arabic.cd_cont_3000//variances
    INFO: ms_gauden.c(292): 3120 codebook, 1 feature, size:
    INFO: ms_gauden.c(294): 8x39
    INFO: ms_gauden.c(354): 0 variance values floored
    INFO: ms_senone.c(149): Reading senone mixture weights: arabic.cd_cont_3000//mixture_weights
    INFO: ms_senone.c(200): Truncating senone logs3(pdf) values by 10 bits
    INFO: ms_senone.c(207): Not transposing mixture weights in memory
    INFO: ms_senone.c(268): Read mixture weights for 3120 senones: 1 features x 8 codewords
    INFO: ms_senone.c(320): Mapping senones to individual codebooks
    INFO: ms_mgau.c(141): The value of topn: 4
    INFO: phone_loop_search.c(115): State beam -225 Phone exit beam -225 Insertion penalty 0
    INFO: dict.c(320): Allocating 5483 * 32 bytes (171 KiB) for word entries
    INFO: dict.c(333): Reading main dictionary: arabic.dic
    INFO: dict.c(213): Allocated 23 KiB for strings, 20 KiB for phones
    INFO: dict.c(336): 1384 words read
    INFO: dict.c(342): Reading filler dictionary: arabic.cd_cont_3000//noisedict
    INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(345): 3 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(406): Allocating 40^3 * 2 bytes (125 KiB) for word-initial triphones
    INFO: dict2pid.c(132): Allocated 38720 bytes (37 KiB) for word-final triphones
    INFO: dict2pid.c(196): Allocated 38720 bytes (37 KiB) for single-phone word triphones
    INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
    INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(196): ngrams 1=1386, 2=2375, 3=2970
    INFO: ngram_model_dmp.c(242): 1386 = LM.unigrams(+trailer) read
    INFO: ngram_model_dmp.c(288): 2375 = LM.bigrams(+trailer) read
    INFO: ngram_model_dmp.c(314): 2970 = LM.trigrams read
    INFO: ngram_model_dmp.c(339): 75 = LM.prob2 entries read
    INFO: ngram_model_dmp.c(359): 115 = LM.bo_wt2 entries read
    INFO: ngram_model_dmp.c(379): 55 = LM.prob3 entries read
    INFO: ngram_model_dmp.c(407): 5 = LM.tseg_base entries read
    INFO: ngram_model_dmp.c(463): 1386 = ascii word strings read
    INFO: ngram_search_fwdtree.c(99): 138 unique initial diphones
    INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 4 single-phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 4 single-phone words
    INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 6257
    INFO: ngram_search_fwdtree.c(339): after: 138 root, 6129 non-root channels, 3 single-phone words
    INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: continuous.c(299): pocketsphinx_continuous COMPILED ON: Jun 1 2015, AT: 15:48:38
    INFO: cmn_prior.c(131): cmn_prior_update: from < 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >
    INFO: cmn_prior.c(149): cmn_prior_update: to < 73.73 4.09 -21.00 23.73 -18.06 -14.50 -2.88 -21.45 1.83 -14.21 5.62 -2.52 -5.64 >
    INFO: ngram_search_fwdtree.c(1553): 1216 words recognized (3/fr)
    INFO: ngram_search_fwdtree.c(1555): 61689 senones evaluated (142/fr)
    INFO: ngram_search_fwdtree.c(1559): 31398 channels searched (72/fr), 27131 1st, 1223 last
    INFO: ngram_search_fwdtree.c(1562): 1223 words for which last channels evaluated (2/fr)
    INFO: ngram_search_fwdtree.c(1564): 0 candidate words for entering last phone (0/fr)
    INFO: ngram_search_fwdtree.c(1567): fwdtree 0.10 CPU 0.023 xRT
    INFO: ngram_search_fwdtree.c(1570): fwdtree 0.10 wall 0.023 xRT
    INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 2 words
    INFO: ngram_search_fwdflat.c(945): 1216 words recognized (3/fr)
    INFO: ngram_search_fwdflat.c(947): 1296 senones evaluated (3/fr)
    INFO: ngram_search_fwdflat.c(949): 1290 channels searched (2/fr)
    INFO: ngram_search_fwdflat.c(951): 1290 words searched (2/fr)
    INFO: ngram_search_fwdflat.c(954): 76 word transitions (0/fr)
    INFO: ngram_search_fwdflat.c(957): fwdflat 0.00 CPU 0.000 xRT
    INFO: ngram_search_fwdflat.c(960): fwdflat 0.00 wall 0.001 xRT
    INFO: ngram_search.c(1252): lattice start node .0 end node .374
    INFO: ngram_search.c(1278): Eliminated 0 nodes before end node
    INFO: ngram_search.c(1383): Lattice has 12 nodes, 14 links
    INFO: ps_lattice.c(1380): Bestpath score: -2554
    INFO: ps_lattice.c(1384): Normalizer P(O) = alpha(:374:431) = -153539
    INFO: ps_lattice.c(1441): Joint P(O,S) = -166369 P(S|O) = -12830
    INFO: ngram_search.c(874): bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(877): bestpath 0.00 wall 0.000 xRT
    INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 0.10 CPU 0.023 xRT
    INFO: ngram_search_fwdtree.c(435): TOTAL fwdtree 0.10 wall 0.023 xRT
    INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.00 CPU 0.000 xRT
    INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.00 wall 0.001 xRT
    INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU 0.000 xRT
    INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall 0.000 xRT

     
    • Nickolay V. Shmyrev

      It looks ok. It should recognize accurately after first utterance. You can add a line

           -cmninit 73.73,4.09,-21.00
      

      to model/feat.params to get more accurate recognition of the first utterance.

       
  • jam

    jam - 2015-06-30

    I did not know that it is not recognizing the first utterance.

    I will try to add the line and get back to you with the result.

    Thanks a lot for your help,

     
  • jam

    jam - 2015-06-30

    after adding the line it gives me the result for hte first utterance.

    The first utterance is correct for the continuous model and the first word only is wrong for the ptm model.

    Thats very good. I will try some more utterances before going to Android.

    Thanks for your help and reactivity.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.