Menu

Acuracy Problem with PocketSphinx (Cont. HMM)

Help
Mike
2010-08-19
2012-09-22
  • Mike

    Mike - 2010-08-19

    o) 16K conversational speech (subset of ICSI meeting corpus)
    o) continuous HMM (3-state no skip, ~3000 senones, 16-mixture)
    o) trigram LM
    o) major parameters have been tunes for both Sphinx3 and PocketSphinx

    Problem:
    PocketSphinx gives 3-4% Accuracy drop, comparing with Sphinx3.
    (No matter how we tune the parameters)

    Sphinx3:
    59.3% Acc(1 - WER)
    lw: 11
    beam: 1e-55
    pbeam: 1e-55
    wbeam: 1e-35
    wip: 0.2
    playing around with parameters I can achieve maximum 60.1% Acc!

    PocketSphinx
    55.9% Acc
    lw: 7
    beam: 1e-53
    pbeam: 1e-53
    wbeam: 1e-35
    wip: 0.2
    playing around with parameters I can achieve maximum 56.5% Acc!

    My Questions:
    1) Am I missing something for PocketSphinx?
    2) Will semi-cont 5-state HMM give better accuracy?
    (I am re-training semi HMM now, but logically I don't think semi-cont will
    give better Acc...)

    Thanks!

     
  • Nickolay V. Shmyrev

    It's not clear which decoding mode are you using. I suggest you to provide
    heads of the decoding logs where all parameters are listed.

     
  • Mike

    Mike - 2010-08-20

    Hi nshmyrev,

    The head of decoding log is attached.

    I noticed an error "ERROR: "ptm_mgau.c", line 801: Number of codebooks exceeds
    256: 2783"

    Thanks!


    INFO: cmd_ln.c(512): Parsing command line:
    /home/tao/SphinxEval/pocketsphinx/bin/pocketsphinx_batch \
    -hmm /home/tao/icsi_meeting/model/hmm/16mix_6 \
    -lw 5 \
    -feat 1s_c_d_dd \
    -beam 1e-55 \
    -pbeam 1e-55 \
    -wbeam 1e-35 \
    -dict /home/tao/icsi_meeting/etc/cmu_nosp_new.dict \
    -fdict /home/tao/icsi_meeting/etc/cmu_nosp.filler \
    -lm /home/tao/icsi_meeting/model/lm/lm_csr_6k_nvp_3gram.DMP \
    -wip 0.2 \
    -ctl /home/tao/icsi_meeting/etc/test_mini_no_ext.scp \
    -cepdir /home/tao/icsi_meeting \
    -cepext .mfc \
    -hyp /home/tao/icsi_meeting/test_ps.match_0 \
    -agc none \
    -varnorm no \
    -cmn current \
    -ctlcount 615 \
    -ctloffset 0

    Current configuration:

    -adchdr 0 0
    -adcin no no
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -argfile
    -ascale 20.0 2.000000e+01
    -backtrace no no
    -beam 1e-48 1.000000e-55
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -bghist no no
    -cepdir /home/tao/icsi_meeting
    -cepext .mfc .mfc
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -ctl /home/tao/icsi_meeting/etc/test_mini_no_ext.scp
    -ctlcount -1 615
    -ctlincr 1 1
    -ctloffset 0 0
    -ctm
    -debug 0
    -dict /home/tao/icsi_meeting/etc/cmu_nosp_new.dict
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict /home/tao/icsi_meeting/etc/cmu_nosp.filler
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm /home/tao/icsi_meeting/model/hmm/16mix_6
    -hyp /home/tao/icsi_meeting/test_ps.match_0
    -hypseg
    -input_endian little little
    -jsgf
    -kdmaxbbi -1 -1
    -kdmaxdepth 0 0
    -kdtree
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lextreedump 0 0
    -lifter 0 0
    -lm /home/tao/icsi_meeting/model/lm/lm_csr_6k_nvp_3gram.DMP
    -lmctl
    -lmname default default
    -lmnamectl
    -logbase 1.0001 1.000100e+00
    -logfn
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 5.000000e+00
    -maxhmmpf -1 -1
    -maxnewoov 20 20
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mllrctl
    -mllrdir
    -mmap yes yes
    -nbest 0 0
    -nbestdir
    -nbestext .hyp .hyp
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -outlatdir
    -pbeam 1e-48 1.000000e-55
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-5 1.000000e-05
    -pl_window 0 0
    -rawlogdir
    -remove_dc no no
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -usewdphones no no
    -uw 1.0 1.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 1.000000e-35
    -wip 0.65 2.000000e-01
    -wlen 0.025625 2.562500e-02

    INFO: feat.c(979): Initializing feature stream to type: '1s_c_d_dd',
    ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean= 12.00, mean= 0.0
    INFO: mdef.c(520): Reading model definition:
    /home/tao/icsi_meeting/model/hmm/16mix_6/mdef
    INFO: bin_mdef.c(173): Allocating 86389 * 8 bytes (674 KiB) for CD tree
    INFO: tmat.c(205): Reading HMM transition probability matrices:
    /home/tao/icsi_meeting/model/hmm/16mix_6/transition_matrices
    INFO: acmod.c(117): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/tao/icsi_meeting/model/hmm/16mix_6/means
    INFO: ms_gauden.c(292): 2783 codebook, 1 feature, size
    16x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/tao/icsi_meeting/model/hmm/16mix_6/variances
    INFO: ms_gauden.c(292): 2783 codebook, 1 feature, size
    16x39
    INFO: ms_gauden.c(356): 0 variance values floored
    INFO: acmod.c(119): Attempting to use PTHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/tao/icsi_meeting/model/hmm/16mix_6/means
    INFO: ms_gauden.c(292): 2783 codebook, 1 feature, size
    16x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/tao/icsi_meeting/model/hmm/16mix_6/variances
    INFO: ms_gauden.c(292): 2783 codebook, 1 feature, size
    16x39
    INFO: ms_gauden.c(356): 0 variance values floored
    ERROR: "ptm_mgau.c", line 801: Number of codebooks exceeds 256: 2783
    INFO: acmod.c(121): Falling back to general multi-stream GMM computation
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/tao/icsi_meeting/model/hmm/16mix_6/means
    INFO: ms_gauden.c(292): 2783 codebook, 1 feature, size
    16x39
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    /home/tao/icsi_meeting/model/hmm/16mix_6/variances
    INFO: ms_gauden.c(292): 2783 codebook, 1 feature, size
    16x39
    INFO: ms_gauden.c(356): 0 variance values floored
    INFO: ms_senone.c(160): Reading senone mixture weights:
    /home/tao/icsi_meeting/model/hmm/16mix_6/mixture_weights
    INFO: ms_senone.c(211): Truncating senone logs3(pdf) values by 10 bits
    INFO: ms_senone.c(218): Not transposing mixture weights in memory
    INFO: ms_senone.c(277): Read mixture weights for 2783 senones: 1 features x 16
    codewords
    INFO: ms_senone.c(331): Mapping senones to individual codebooks
    INFO: ms_mgau.c(123): The value of topn: 4
    INFO: dict.c(294): Allocating 11617 * 20 bytes (226 KiB) for word entries
    INFO: dict.c(306): Reading main dictionary:
    /home/tao/icsi_meeting/etc/cmu_nosp_new.dict
    INFO: dict.c(206): Allocated 55 KiB for strings, 88 KiB for phones
    INFO: dict.c(309): 7518 words read
    INFO: dict.c(314): Reading filler dictionary:
    /home/tao/icsi_meeting/etc/cmu_nosp.filler
    INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(317): 3 words read
    INFO: dict2pid.c(402): Building PID tables for dictionary
    INFO: dict2pid.c(409): Allocating 7521 * 4 bytes (29 KiB) for word-internal
    arrays
    INFO: dict2pid.c(414): Allocating 41^3 * 2 bytes (134 KiB) for word-initial
    triphones
    INFO: dict2pid.c(453): Allocating 30332 entries of 2 bytes (59 KiB) for
    internal ssids
    INFO: dict2pid.c(130): Allocated 20336 bytes (19 KiB) for word-final triphones
    INFO: dict2pid.c(193): Allocated 20336 bytes (19 KiB) for single-phone word
    triphones
    ERROR: "ngram_model_arpa.c", line 76: No \data\ mark in LM file
    INFO: ngram_model_dmp.c(141): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(195): ngrams 1=6197, 2=67406, 3=11661
    INFO: ngram_model_dmp.c(241): 6197 = LM.unigrams(+trailer) read
    INFO: ngram_model_dmp.c(289): 67406 = LM.bigrams(+trailer) read
    INFO: ngram_model_dmp.c(314): 11661 = LM.trigrams read
    INFO: ngram_model_dmp.c(338): 21740 = LM.prob2 entries read
    INFO: ngram_model_dmp.c(357): 2422 = LM.bo_wt2 entries read
    INFO: ngram_model_dmp.c(377): 8881 = LM.prob3 entries read
    INFO: ngram_model_dmp.c(405): 132 = LM.tseg_base entries read
    INFO: ngram_model_dmp.c(461): 6197 = ascii word strings read
    INFO: ngram_search_fwdtree.c(99): 454 unique initial diphones
    INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 45 single-
    phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 45
    single-phone words
    INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 16431
    INFO: ngram_search_fwdtree.c(333): after: 454 root, 16303 non-root channels,
    44 single-phone words
    INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: ngram_search.c(407): Resized backpointer table to 10000 entries
    INFO: ngram_search_fwdtree.c(1502): 5150 words recognized (21/fr)
    INFO: ngram_search_fwdtree.c(1504): 409046 senones evaluated (1676/fr)
    INFO: ngram_search_fwdtree.c(1506): 815160 channels searched (3340/fr), 106830
    1st, 172547 last
    INFO: ngram_search_fwdtree.c(1510): 16256 words for which last channels
    evaluated (66/fr)
    INFO: ngram_search_fwdtree.c(1513): 41910 candidate words for entering last
    phone (171/fr)
    INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 275 words
    INFO: ngram_search_fwdflat.c(912): 1268 words recognized (5/fr)
    INFO: ngram_search_fwdflat.c(914): 128933 senones evaluated (528/fr)
    INFO: ngram_search_fwdflat.c(916): 199047 channels searched (815/fr)
    INFO: ngram_search_fwdflat.c(918): 16803 words searched (68/fr)
    INFO: ngram_search_fwdflat.c(920): 16066 word transitions (65/fr)
    INFO: ngram_search.c(1132): lattice start node .0 end node .225
    INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(:225:242) = -947361
    INFO: ps_lattice.c(1266): Joint P(O,S) = -956367 P(S|O) = -9006
    INFO: batch.c(659): mfcc_clean_mini/testData/Bmr031/ct-
    chan0_fe008_1507.031-1509.476_Bmr031: 2.43 seconds speech, 1.98 seconds CPU,
    1.98 seconds wall
    INFO: batch.c(661): mfcc_clean_mini/testData/Bmr031/ct-
    chan0_fe008_1507.031-1509.476_Bmr031: 0.81 xRT (CPU), 0.81 xRT (elapsed)
    INFO: ngram_search_fwdtree.c(1502): 4992 words recognized (26/fr)
    INFO: ngram_search_fwdtree.c(1504): 371653 senones evaluated (1906/fr)
    INFO: ngram_search_fwdtree.c(1506): 911974 channels searched (4676/fr), 81871
    1st, 94145 last
    INFO: ngram_search_fwdtree.c(1510): 11509 words for which last channels
    evaluated (59/fr)
    INFO: ngram_search_fwdtree.c(1513): 71144 candidate words for entering last
    phone (364/fr)
    INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 131 words
    INFO: ngram_search_fwdflat.c(912): 1241 words recognized (6/fr)
    INFO: ngram_search_fwdflat.c(914): 91853 senones evaluated (471/fr)
    INFO: ngram_search_fwdflat.c(916): 106460 channels searched (545/fr)
    INFO: ngram_search_fwdflat.c(918): 10184 words searched (52/fr)
    INFO: ngram_search_fwdflat.c(920): 8682 word transitions (44/fr)
    INFO: ngram_search.c(1132): lattice start node .0 end node .176
    INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(:176:193) = -518841
    INFO: ps_lattice.c(1266): Joint P(O,S) = -524764 P(S|O) = -5923

     
  • Nickolay V. Shmyrev

    Hi Tao!

    Strange, this is the first time I see such result. Few thoughts on that:

    1) ERRORs in the log are not critical, they are actually changed to INFO in
    trunk

    2) Did you train model with SphinxTrain?

    3) What is the accuracy in fwdtree mode in pocketsphinx (-fwdflat no). They
    must be more or less the same with sphinx3
    which I suppose you also run in fwdtree mode

    4) Not sure if it makes sense to check pocketsphinx-0.5 to see if there are
    any regressions

     
  • Mike

    Mike - 2010-08-23

    2) Did you train model with SphinxTrain?

    No. We trained using HTK and converted to Sphinx format. Originally, HTK
    decoder achieves 62.*% accuracy.

    3) What is the accuracy in fwdtree mode in pocketsphinx (-fwdflat no). They
    must be more or less the same with sphinx3 which > I suppose you also run in
    fwdtree mode

    Will try that and post update.

    4) Not sure if it makes sense to check pocketsphinx-0.5 to see if there are
    any regressions

    Will give this a try as well.

    Thanks very much for all the suggestions!

     
  • Nickolay V. Shmyrev

    Originally, HTK decoder achieves 62.*% accuracy.

    That's a serious treat for us! We should definitely solve it!

    Next set of questions:

    1) Which converter did you use? htk2s3.py from our trunk or something home-
    made?

    2) What is s3 accuracy with very-very wide beams (like 1e-200). I remember
    from David's comaprision that HDecode beams comparing to sphinx3 ones are
    actually very wide.

     
  • Mike

    Mike - 2010-08-24

    1) Using Wout's converter: It is written in python with help of David. I guess
    it's similar to htk2s3.py in your trunk. (will take an look at htk2s3.py)
    http://home.student.utwente.nl/w.j.maaskant/htk2s3conv/
    the model conversion is a complicated problem, we spent weeks on this, still
    2% gap.

    2) Using very wide beam itself didn't help (I tried beam=1e-300 and 1e-500
    before). partial results attached (pbeam always equal to beam).
    For regular beam width, I found that wip=0.001 is very optimized value for
    this corpus. For a very wide beam, I didn't able to find a magic number yet...

    3) Question: did David or anyone else ever achieve same accuracy after
    converting HTK model to S3?

    4) Nick, I will send personal emails to you for better communications.

    lw beam wbeam wip dur acc
    11.25 1e-70 1e-35 0.001 857 60.2
    lw beam wbeam wip dur acc
    11.25 1e-70 1e-40 0.001 927 59.8
    lw beam wbeam wip dur acc
    11.25 1e-70 1e-50 0.001 1172 59.6
    lw beam wbeam wip dur acc
    11.25 1e-70 1e-60 0.001 1644 59.5
    lw beam wbeam wip dur acc
    11.25 1e-70 1e-70 0.001 2340 59.5
    lw beam wbeam wip dur acc
    11.25 1e-70 1e-80 0.001 2810 59.5
    lw beam wbeam wip dur acc
    11.25 1e-80 1e-35 0.001 1043 60.3
    lw beam wbeam wip dur acc
    11.25 1e-80 1e-40 0.001 1167 59.8
    lw beam wbeam wip dur acc
    11.25 1e-80 1e-50 0.001 1363 59.7
    lw beam wbeam wip dur acc
    11.25 1e-80 1e-60 0.001 1839 59.5
    lw beam wbeam wip dur acc
    11.25 1e-80 1e-70 0.001 2650 59.5
    lw beam wbeam wip dur acc
    11.25 1e-80 1e-80 0.001 3547 59.5
    lw beam wbeam wip dur acc
    11.25 1e-90 1e-35 0.001 1212 60.4
    lw beam wbeam wip dur acc
    11.25 1e-90 1e-40 0.001 1278 59.9
    lw beam wbeam wip dur acc
    11.25 1e-90 1e-50 0.001 1528 59.8
    lw beam wbeam wip dur acc
    11.25 1e-90 1e-60 0.001 2029 59.7
    lw beam wbeam wip dur acc
    11.25 1e-90 1e-70 0.001 2860 59.7
    lw beam wbeam wip dur acc
    11.25 1e-90 1e-80 0.001 3802 59.6
    lw beam wbeam wip dur acc
    11.25 1e-100 1e-35 0.001 1347 60.4
    lw beam wbeam wip dur acc
    11.25 1e-100 1e-40 0.001 1423 59.9
    lw beam wbeam wip dur acc
    11.25 1e-100 1e-50 0.001 1678 59.8
    lw beam wbeam wip dur acc
    11.25 1e-100 1e-60 0.001 2184 59.7
    lw beam wbeam wip dur acc
    11.25 1e-100 1e-70 0.001 2991 59.7
    lw beam wbeam wip dur acc
    11.25 1e-100 1e-80 0.001 3977 59.6
    lw beam wbeam wip dur acc
    11.25 1e-120 1e-35 0.001 1633 60.4
    lw beam wbeam wip dur acc
    11.25 1e-120 1e-40 0.001 1703 59.9
    lw beam wbeam wip dur acc
    11.25 1e-120 1e-50 0.001 1974 59.8
    lw beam wbeam wip dur acc
    11.25 1e-120 1e-60 0.001 2394 59.7
    lw beam wbeam wip dur acc
    11.25 1e-120 1e-70 0.001 3206 59.7

     
  • Nickolay V. Shmyrev

    the model conversion is a complicated problem, we spent weeks on this, still
    2% gap

    Yes, I also suspect something wrong with the converted model. Probably
    transition probs or something like that. I found pocketsphinx is very
    sensitive to transition probs. We need to closer look on it.

    We can also compare scores of the models on the data which is recognized
    incorrectly. If you'll provide such utterance and the models I can look
    myself.

    Question: did David or anyone else ever achieve same accuracy after
    converting HTK model to S3

    Unfortunately that text is not available and it's better to ask David directly
    but I remember he trained with Sphinxtrain on WSJ and just compared results to
    Keith WSJ on HTK.

    Nick, I will send personal emails to you for better communications.

    Yes, please do. Or use cmusphinx-devel. I'm sure you'll get more feedback
    there.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.