Menu

English broadcast news demo pocketsphinx

Help
2010-10-26
2012-09-22
  • vijayabharadwaj gsr

    Dear Sir,

    I am trying to decode some files which are 16k, 16bit linear pcm using
    pocketsphinx using US English Broadcast News Acoustic Model. This work I am
    doing on desktop. I have used the command

    $SPHINXDIR/bin/pocketsphinx_batch -hmm models/hmm/ -lm models/lm/lm_giga.DMP
    -dict models/lm/lm_giga.dic -fdict models/lm/lm_giga.filler -hyp out.txt
    -logfn log.txt -ctl feats.ctl -cepdir feats/

    the output prodcues nothing:

    ./decode.sh
    INFO: cmd_ln.c(512): Parsing command line:
    /home/bharadwaj/pocketsphinx/pocketsphinx/bin/pocketsphinx_batch \
    -hmm models/hmm/ \
    -lm models/lm/lm_giga.DMP \
    -dict models/lm/lm_giga.dic \
    -fdict models/lm/lm_giga.filler \
    -hyp out.txt \
    -logfn log.txt \
    -ctl feats.ctl \
    -cepdir feats/

    Current configuration:

    -adchdr 0 0
    -adcin no no
    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -argfile
    -ascale 20.0 2.000000e+01
    -backtrace no no
    -beam 1e-48 1.000000e-48
    -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00
    -bghist no no
    -build_outdirs yes yes
    -cepdir feats/
    -cepext .mfc .mfc
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -compallsen no no
    -ctl feats.ctl
    -ctlcount -1 -1
    -ctlincr 1 1
    -ctloffset 0 0
    -ctm
    -debug 0
    -dict models/lm/lm_giga.dic
    -dictcase no no
    -dither no no
    -doublebw no no
    -ds 1 1
    -fdict models/lm/lm_giga.filler
    -feat 1s_c_d_dd 1s_c_d_dd
    -featparams
    -fillprob 1e-8 1.000000e-08
    -frate 100 100
    -fsg
    -fsgctl
    -fsgdir
    -fsgext
    -fsgusealtpron yes yes
    -fsgusefiller yes yes
    -fwdflat yes yes
    -fwdflatbeam 1e-64 1.000000e-64
    -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00
    -fwdflatsfwin 25 25
    -fwdflatwbeam 7e-29 7.000000e-29
    -fwdtree yes yes
    -hmm models/hmm/
    -hyp out.txt
    -hypseg
    -input_endian little little
    -jsgf
    -kdmaxbbi -1 -1
    -kdmaxdepth 0 0
    -kdtree
    -latsize 5000 5000
    -lda
    -ldadim 0 0
    -lextreedump 0 0
    -lifter 0 0
    -lm models/lm/lm_giga.DMP
    -lmctl
    -lmname default default
    -lmnamectl
    -logbase 1.0001 1.000100e+00
    -logfn log.txt
    -logspec no no
    -lowerf 133.33334 1.333333e+02
    -lpbeam 1e-40 1.000000e-40
    -lponlybeam 7e-29 7.000000e-29
    -lw 6.5 6.500000e+00
    -maxhmmpf -1 -1
    -maxnewoov 20 20
    -maxwpf -1 -1
    -mdef
    -mean
    -mfclogdir
    -mixw
    -mixwfloor 0.0000001 1.000000e-07
    -mllr
    -mllrctl
    -mllrdir
    -mllrext
    -mmap yes yes
    -nbest 0 0
    -nbestdir
    -nbestext .hyp .hyp
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 40
    -nwpen 1.0 1.000000e+00
    -outlatdir
    -pbeam 1e-48 1.000000e-48
    -pip 1.0 1.000000e+00
    -pl_beam 1e-10 1.000000e-10
    -pl_pbeam 1e-5 1.000000e-05
    -pl_window 0 0
    -rawlogdir
    -remove_dc no no
    -round_filters yes yes
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -sendump
    -senmgau
    -silprob 0.005 5.000000e-03
    -smoothspec no no
    -svspec
    -tmat
    -tmatfloor 0.0001 1.000000e-04
    -topn 4 4
    -topn_beam 0 0
    -toprule
    -transform legacy legacy
    -unit_area yes yes
    -upperf 6855.4976 6.855498e+03
    -usewdphones no no
    -uw 1.0 1.000000e+00
    -var
    -varfloor 0.0001 1.000000e-04
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wbeam 7e-29 7.000000e-29
    -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02

    256x13 256x13 256x13
    256x13 256x13 256x13
    256x13 256x13 256x13
    256x13 256x13 256x13

    But the output file is empty.

    Can you please tell what went wrong?

     
  • vijayabharadwaj gsr

    I forgot to mention I am using the latest version of pocketsphinx and
    sphinxbase 0.6.1.

     
  • Nickolay V. Shmyrev

    Can you please tell what went wrong?

    No we can't tell you that. You didn't provide sufficient information. Neither
    you didn't describe how do you extract features nor you provided the decoding
    log.

     
  • vijayabharadwaj gsr

    I have extracted MFCC features using the command

    $SPHINXBASE/bin/sphinx_fe -c feats.ctl -mswav yes -di wav -do feats -ei wav
    -eo mfc

    then decoding

    $SPHINXDIR/bin/pocketsphinx_batch -hmm models/hmm/ -lm models/lm/lm_giga.DMP
    -dict models/lm/lm_giga.dic -fdict models/lm/lm_giga.filler -hyp out.txt
    -logfn log.txt -ctl feats.ctl -cepdir feats/

    The log file contains following line

    INFO: cmd_ln.c(512): Parsing command line:
    \
    -nfilt 26 \
    -lowerf 1 \
    -upperf 8000 \
    -wlen 0.025 \
    -transform dct \
    -round_filters no \
    -remove_dc yes \
    -feat 1s_c_d_dd \
    -svspec 0-12/13-25/26-38 \
    -agc none \
    -cmn current \
    -varnorm no

    Current configuration:

    -agc none none
    -agcthresh 2.0 2.000000e+00
    -alpha 0.97 9.700000e-01
    -ceplen 13 13
    -cmn current current
    -cmninit 8.0 8.0
    -dither no no
    -doublebw no no
    -feat 1s_c_d_dd 1s_c_d_dd
    -frate 100 100
    -input_endian little little
    -lda
    -ldadim 0 0
    -lifter 0 0
    -logspec no no
    -lowerf 133.33334 1.000000e+00
    -ncep 13 13
    -nfft 512 512
    -nfilt 40 26
    -remove_dc no yes
    -round_filters yes no
    -samprate 16000 1.600000e+04
    -seed -1 -1
    -smoothspec no no
    -svspec 0-12/13-25/26-38
    -transform legacy dct
    -unit_area yes yes
    -upperf 6855.4976 8.000000e+03
    -varnorm no no
    -verbose no no
    -warp_params
    -warp_type inverse_linear inverse_linear
    -wlen 0.025625 2.500000e-02

    INFO: acmod.c(238): Parsed model-specific feature parameters from
    models/hmm//feat.params
    INFO: feat.c(848): Initializing feature stream to type: '1s_c_d_dd',
    ceplen=13, CMN='current', VARNORM='no', AGC='none'
    INFO: cmn.c(142): mean= 12.00, mean= 0.0
    INFO: acmod.c(163): Using subvector specification 0-12/13-25/26-38
    INFO: mdef.c(520): Reading model definition: models/hmm//mdef
    INFO: bin_mdef.c(173): Allocating 148085 * 8 bytes (1156 KiB) for CD tree
    INFO: tmat.c(205): Reading HMM transition probability matrices:
    models/hmm//transition_matrices
    INFO: acmod.c(117): Attempting to use SCHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: models/hmm//means
    INFO: ms_gauden.c(292): 50 codebook, 3 feature, size
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    models/hmm//variances
    INFO: ms_gauden.c(292): 50 codebook, 3 feature, size
    INFO: ms_gauden.c(356): 9400 variance values floored
    INFO: acmod.c(119): Attempting to use PTHMM computation module
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter: models/hmm//means
    INFO: ms_gauden.c(292): 50 codebook, 3 feature, size
    INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
    models/hmm//variances
    INFO: ms_gauden.c(292): 50 codebook, 3 feature, size
    INFO: ms_gauden.c(356): 9400 variance values floored
    INFO: ptm_mgau.c(473): Loading senones from dump file models/hmm//sendump
    INFO: ptm_mgau.c(497): BEGIN FILE FORMAT DESCRIPTION
    INFO: ptm_mgau.c(560): Rows: 256, Columns: 5150
    INFO: ptm_mgau.c(592): Using memory-mapped I/O for senones
    INFO: ptm_mgau.c(831): Maximum top-N: 4
    INFO: dict.c(294): Allocating 57097 * 20 bytes (1115 KiB) for word entries
    INFO: dict.c(306): Reading main dictionary: models/lm/lm_giga.dic
    ERROR: "dict.c", line 190: Line 48083: Bad ciphone: PH; word top(2) ignored
    INFO: dict.c(206): Allocated 412 KiB for strings, 681 KiB for phones
    INFO: dict.c(309): 52987 words read
    INFO: dict.c(314): Reading filler dictionary: models/lm/lm_giga.filler
    INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
    INFO: dict.c(317): 13 words read
    INFO: dict2pid.c(396): Building PID tables for dictionary
    INFO: dict2pid.c(405): Allocating 50^3 * 2 bytes (244 KiB) for word-initial
    triphones
    INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
    INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word
    triphones
    ERROR: "ngram_model_arpa.c", line 76: No \data\ mark in LM file
    INFO: ngram_model_dmp.c(141): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(195): ngrams 1=10092, 2=108557, 3=217998
    INFO: ngram_model_dmp.c(241): 10092 = LM.unigrams(+trailer) read
    INFO: ngram_model_dmp.c(289): 108557 = LM.bigrams(+trailer) read
    INFO: ngram_model_dmp.c(314): 217998 = LM.trigrams read
    INFO: ngram_model_dmp.c(338): 4832 = LM.prob2 entries read
    INFO: ngram_model_dmp.c(357): 5142 = LM.bo_wt2 entries read
    INFO: ngram_model_dmp.c(377): 2298 = LM.prob3 entries read
    INFO: ngram_model_dmp.c(405): 213 = LM.tseg_base entries read
    INFO: ngram_model_dmp.c(461): 10092 = ascii word strings read
    INFO: ngram_search_fwdtree.c(99): 684 unique initial diphones
    INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 58 single-
    phone words
    INFO: ngram_search_fwdtree.c(186): Creating search tree
    INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 58
    single-phone words
    INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 24175
    INFO: ngram_search_fwdtree.c(333): after: 540 root, 24047 non-root channels,
    40 single-phone words
    INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
    INFO: cmn.c(175): CMN: 11.18 0.54 -0.51 0.13 -0.36 -0.21 -0.27 -0.20 -0.08
    -0.15 -0.05 -0.08 -0.20
    INFO: ngram_search.c(407): Resized backpointer table to 10000 entries
    INFO: ngram_search.c(407): Resized backpointer table to 20000 entries
    INFO: ngram_search.c(407): Resized backpointer table to 40000 entries
    INFO: ngram_search.c(407): Resized backpointer table to 80000 entries
    INFO: ngram_search.c(407): Resized backpointer table to 160000 entries
    INFO: ngram_search.c(415): Resized score stack to 200000 entries
    INFO: ngram_search_fwdtree.c(1513): 159309 words recognized (11/fr)
    INFO: ngram_search_fwdtree.c(1515): 55361695 senones evaluated (3732/fr)
    INFO: ngram_search_fwdtree.c(1517): 77153274 channels searched (5200/fr),
    8008740 1st, 980414 last
    INFO: ngram_search_fwdtree.c(1521): 606915 words for which last channels
    evaluated (40/fr)
    INFO: ngram_search_fwdtree.c(1524): 6308936 candidate words for entering last
    phone (425/fr)
    INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 6 words
    INFO: ngram_search_fwdflat.c(912): 113247 words recognized (8/fr)
    INFO: ngram_search_fwdflat.c(914): 634609 senones evaluated (43/fr)
    INFO: ngram_search_fwdflat.c(916): 227700 channels searched (15/fr)
    INFO: ngram_search_fwdflat.c(918): 166040 words searched (11/fr)
    INFO: ngram_search_fwdflat.c(920): 1988 word transitions (0/fr)
    WARNING: "ngram_search.c", line 1087: not found in last frame, using
    ++NOISE++ instead
    INFO: ngram_search.c(1137): lattice start node .0 end node ++NOISE++.3
    INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(++NOISE++:3:14833) =
    -2804706
    INFO: ps_lattice.c(1266): Joint P(O,S) = -2804706 P(S|O) = 0
    INFO: batch.c(661): cnet_hitachi: 148.34 seconds speech, 56.74 seconds CPU,
    56.79 seconds wall
    INFO: batch.c(663): cnet_hitachi: 0.38 xRT (CPU), 0.38 xRT (elapsed)
    INFO: cmn.c(175): CMN: 11.47 0.31 -0.56 0.25 -0.48 -0.09 -0.31 -0.17 -0.09
    -0.15 -0.03 -0.10 -0.21
    INFO: ngram_search_fwdtree.c(1513): 151255 words recognized (11/fr)
    INFO: ngram_search_fwdtree.c(1515): 51398672 senones evaluated (3749/fr)
    INFO: ngram_search_fwdtree.c(1517): 73053908 channels searched (5328/fr),
    7400700 1st, 954554 last
    INFO: ngram_search_fwdtree.c(1521): 562362 words for which last channels
    evaluated (41/fr)
    INFO: ngram_search_fwdtree.c(1524): 6030826 candidate words for entering last
    phone (439/fr)
    INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 6 words
    INFO: ngram_search_fwdflat.c(912): 107356 words recognized (8/fr)
    INFO: ngram_search_fwdflat.c(914): 543507 senones evaluated (40/fr)
    INFO: ngram_search_fwdflat.c(916): 192388 channels searched (14/fr)
    INFO: ngram_search_fwdflat.c(918): 152742 words searched (11/fr)
    INFO: ngram_search_fwdflat.c(920): 1336 word transitions (0/fr)
    INFO: ngram_search.c(1137): lattice start node .0 end node .13700

    It stops for long time here. I waited for more than 20 min. But no out put in
    the output file.

     
  • Nickolay V. Shmyrev

    I have extracted MFCC features using the command $SPHINXBASE/bin/sphinx_fe
    -c feats.ctl -mswav yes -di wav -do feats -ei wav -eo mfc

    You made a mistake here. Broadcast model uses special non-default feature
    parameters. To properly extract features you need to add an option

    -argfile model/feat.params
    

    to take the same feature extraction parametes that were used during training
    the model

     

Log in to post a comment.