Menu

PocketSphinx speech recognizer not working

Help
Anonymous
2011-04-07
2012-09-22
  • Anonymous

    Anonymous - 2011-04-07

    I have Ubuntu installed and have PocketSphinx installed. I wrote a simple
    Python program that would do the same that David Huggins-Daines did in the
    PyCon 2010 talk (saw the video). I have a wav file that I recorded using
    audacity 44100 sample rate that I use as the file to recognize the words and
    when I run the python program I see no decoded text output. Here is my program
    =================== import pocketsphinx as ps import sphinxbase if
    name=='main': decoder =
    ps.Decoder(hmm="/usr/share/pocketsphinx/model/hmm/wsj1",
    lm="/usr/share/pocketsphinx/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP",
    dict="/usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic") #fh =
    file('welcome.wav', 'rb') fh =
    file('/home/sganguly/work/bataahoo_pocketsphinx/welcome.wav', 'rb')
    fh.seek(44) decoder.decode_raw(fh) decoder.get_hyp()
    =========================== Below is the output when I run my Python program
    =========================== INFO: cmd_ln.c(506): Parsing command line: \ -hmm
    /usr/share/pocketsphinx/model/hmm/wsj1 \ -lm
    /usr/share/pocketsphinx/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP \ -dict
    /usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic Current configuration: -agc
    none none -agcthresh 2.0 2.000000e+00 -alpha 0.97 9.700000e-01 -ascale 20.0
    2.000000e+01 -backtrace no no -beam 1e-48 1.000000e-48 -bestpath yes yes
    -bestpathlw 9.5 9.500000e+00 -cep2spec no no -ceplen 13 13 -cmn current
    current -cmninit 8.0 8.0 -compallsen no no -dict
    /usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic -dictcase no no -dither no no
    -doublebw no no -ds 1 1 -fdict -feat 1s_c_d_dd 1s_c_d_dd -featparams -fillprob
    1e-8 1.000000e-08 -frate 100 100 -fsg -fsgusealtpron yes yes -fsgusefiller yes
    yes -fwdflat yes yes -fwdflatbeam 1e-64 1.000000e-64 -fwdflatefwid 4 4
    -fwdflatlw 8.5 8.500000e+00 -fwdflatsfwin 25 25 -fwdflatwbeam 7e-29
    7.000000e-29 -fwdtree yes yes -hmm /usr/share/pocketsphinx/model/hmm/wsj1
    -input_endian little little -jsgf -kdmaxbbi -1 -1 -kdmaxdepth 0 0 -kdtree
    -latsize 5000 5000 -lda -ldadim 0 0 -lifter 0 0 -lm
    /usr/share/pocketsphinx/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP -lmctl -lmname
    default default -logbase 1.0001 1.000100e+00 -logfn -logspec no no -lowerf
    133.33334 1.333333e+02 -lpbeam 1e-40 1.000000e-40 -lponlybeam 7e-29
    7.000000e-29 -lw 6.5 6.500000e+00 -maxhistpf 100 100 -maxhmmpf -1 -1
    -maxnewoov 20 20 -maxwpf -1 -1 -mdef -mean -mfclogdir -mixw -mixwfloor
    0.0000001 1.000000e-07 -mmap yes yes -ncep 13 13 -nfft 512 512 -nfilt 40 40
    -nwpen 1.0 1.000000e+00 -pbeam 1e-48 1.000000e-48 -pip 1.0 1.000000e+00
    -rawlogdir -remove_dc no no -round_filters yes yes -samprate 16000
    1.600000e+04 -sdmap -seed -1 -1 -sendump -silprob 0.005 5.000000e-03
    -smoothspec no no -spec2cep no no -svspec -tmat -tmatfloor 0.0001 1.000000e-04
    -topn 4 4 -toprule -transform legacy legacy -unit_area yes yes -upperf
    6855.4976 6.855498e+03 -usewdphones no no -uw 1.0 1.000000e+00 -var -varfloor
    0.0001 1.000000e-04 -varnorm no no -verbose no no -warp_params -warp_type
    inverse_linear inverse_linear -wbeam 7e-29 7.000000e-29 -wip 0.65 6.500000e-01
    -wlen 0.025625 2.562500e-02 INFO: cmd_ln.c(506): Parsing command line: \
    -lowerf 1 \ -upperf 4000 \ -nfilt 20 \ -transform dct \ -round_filters no \
    -remove_dc yes \ -feat s2_4x Current configuration: -agc none none -agcthresh
    2.0 2.000000e+00 -alpha 0.97 9.700000e-01 -cep2spec no no -ceplen 13 13 -cmn
    current current -cmninit 8.0 8.0 -dither no no -doublebw no no -feat 1s_c_d_dd
    s2_4x -frate 100 100 -input_endian little little -lda -ldadim 0 0 -lifter 0 0
    -logfn -logspec no no -lowerf 133.33334 1.000000e+00 -mfclogdir -ncep 13 13
    -nfft 512 512 -nfilt 40 20 -rawlogdir -remove_dc no yes -round_filters yes no
    -samprate 16000 1.600000e+04 -seed -1 -1 -smoothspec no no -spec2cep no no
    -svspec -transform legacy dct -unit_area yes yes -upperf 6855.4976
    4.000000e+03 -varnorm no no -verbose no no -warp_params -warp_type
    inverse_linear inverse_linear -wlen 0.025625 2.562500e-02 INFO: acmod.c(82):
    Parsed model-specific feature parameters from
    /usr/share/pocketsphinx/model/hmm/wsj1/feat.params INFO: mdef.c(520): Reading
    model definition: /usr/share/pocketsphinx/model/hmm/wsj1/mdef INFO:
    mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
    INFO: bin_mdef.c(301): Reading binary model definition:
    /usr/share/pocketsphinx/model/hmm/wsj1/mdef INFO: bin_mdef.c(480): 44 CI-
    phone, 66516 CD-phone, 5 emitstate/phone, 220 CI-sen, 5220 Sen, 18660 Sen-Seq
    INFO: tmat.c(204): Reading HMM transition probability matrices:
    /usr/share/pocketsphinx/model/hmm/wsj1/transition_matrices INFO: acmod.c(114):
    Attempting to use SCGMM computation module INFO: s2_semi_mgau.c(981): Reading
    S3 mixture gaussian file '/usr/share/pocketsphinx/model/hmm/wsj1/means' INFO:
    s2_semi_mgau.c(1080): 1 mixture Gaussians, 256 components, 4 feature streams,
    veclen 51 INFO: s2_semi_mgau.c(981): Reading S3 mixture gaussian file
    '/usr/share/pocketsphinx/model/hmm/wsj1/variances' INFO: s2_semi_mgau.c(1080):
    1 mixture Gaussians, 256 components, 4 feature streams, veclen 51 INFO:
    s2_semi_mgau.c(748): Loading senones from dump file
    /usr/share/pocketsphinx/model/hmm/wsj1/sendump INFO: s2_semi_mgau.c(764):
    BEGIN FILE FORMAT DESCRIPTION INFO: s2_semi_mgau.c(793): Rows: 256, Columns:
    5220 INFO: s2_semi_mgau.c(801): Using memory-mapped I/O for senones INFO:
    kdtree.c(231): Reading tree for feature 0 INFO: kdtree.c(249): n_density 256
    n_comp 12 n_level 8 threshold 0.200000 INFO: kdtree.c(186): Read 255 nodes
    INFO: kdtree.c(231): Reading tree for feature 1 INFO: kdtree.c(249): n_density
    256 n_comp 24 n_level 8 threshold 0.200000 INFO: kdtree.c(186): Read 255 nodes
    INFO: kdtree.c(231): Reading tree for feature 2 INFO: kdtree.c(249): n_density
    256 n_comp 3 n_level 8 threshold 0.200000 INFO: kdtree.c(186): Read 255 nodes
    INFO: kdtree.c(231): Reading tree for feature 3 INFO: kdtree.c(249): n_density
    256 n_comp 12 n_level 8 threshold 0.200000 INFO: kdtree.c(186): Read 255 nodes
    INFO: feat.c(849): Initializing feature stream to type: 's2_4x', ceplen=13,
    CMN='current', VARNORM='no', AGC='none' INFO: cmn.c(142): mean= 12.00, mean=
    0.0 INFO: dict.c(232): Allocating 20 placeholders for new OOVs INFO:
    dict.c(494): 6270 = words in file WARNING: "dict.c", line 435: Skipping
    duplicate definition of WARNING: "dict.c", line 435: Skipping duplicate
    definition of
    WARNING: "dict.c", line 435: Skipping duplicate definition
    of <sil> INFO: dict.c(494): 3 = words in file INFO: dict.c(349): LEFT CONTEXT
    TABLES INFO: dict.c(1013): Entry Context table contains 450 entries INFO:
    dict.c(1014): 19800 possible cross word triphones. INFO: dict.c(1052): 17920
    triphones 1792 pseudo diphones 88 uniphones INFO: dict.c(1099): Exit Context
    table contains 450 entries INFO: dict.c(1100): 19800 possible cross word
    triphones. INFO: dict.c(1166): 17920 triphones 1792 pseudo diphones 88
    uniphones INFO: dict.c(1168): 7653 right context entries INFO: dict.c(1169):
    17 ave entries per exit context INFO: dict.c(355): RIGHT CONTEXT TABLES INFO:
    dict.c(1013): Entry Context table contains 416 entries INFO: dict.c(1014):
    18304 possible cross word triphones. INFO: dict.c(1052): 17388 triphones 828
    pseudo diphones 88 uniphones INFO: dict.c(1099): Exit Context table contains
    416 entries INFO: dict.c(1100): 18304 possible cross word triphones. INFO:
    dict.c(1166): 17388 triphones 828 pseudo diphones 88 uniphones INFO:
    dict.c(1168): 8753 right context entries INFO: dict.c(1169): 21 ave entries
    per exit context ERROR: "ngram_model_arpa.c", line 155: No \data\ mark in LM
    file INFO: ngram_model_dmp.c(141): Will use memory-mapped I/O for LM file
    INFO: ngram_model_dmp.c(190): ngrams 1=5002, 2=338656, 3=291318 INFO:
    ngram_model_dmp.c(236): 5002 = LM.unigrams(+trailer) read INFO:
    ngram_model_dmp.c(286): 338656 = LM.bigrams(+trailer) read INFO:
    ngram_model_dmp.c(313): 291318 = LM.trigrams read INFO:
    ngram_model_dmp.c(338): 32470 = LM.prob2 entries read INFO:
    ngram_model_dmp.c(358): 13795 = LM.bo_wt2 entries read INFO:
    ngram_model_dmp.c(379): 31136 = LM.prob3 entries read INFO:
    ngram_model_dmp.c(408): 662 = LM.tseg_base entries read INFO:
    ngram_model_dmp.c(467): 5002 = ascii word strings read INFO:
    ngram_search_fwdtree.c(156): 0 root, 0 non-root channels, 37 single-phone
    words INFO: ngram_search_fwdtree.c(195): Creating search tree INFO:
    ngram_search_fwdtree.c(203): 0 root, 0 non-root channels, 37 single-phone
    words INFO: ngram_search_fwdtree.c(325): max nonroot chan increased to 13871
    INFO: ngram_search_fwdtree.c(334): 443 root, 13743 non-root channels, 17
    single-phone words INFO: ngram_search_fwdflat.c(95): fwdflat: min_ef_width =
    4, max_sf_win = 25 INFO: cmn.c(175): CMN: 13.39 6.68 0.27 1.21 1.36 -0.24 0.70
    0.95 1.02 0.27 -0.13 -0.14 -0.19 INFO: ngram_search.c(368): Resized
    backpointer table to 10000 entries INFO: ngram_search.c(376): Resized score
    stack to 200000 entries INFO: ngram_search.c(368): Resized backpointer table
    to 20000 entries INFO: ngram_search_fwdtree.c(1450): 11828 words recognized
    (9/fr) INFO: ngram_search_fwdtree.c(1452): 3533274 senones evaluated (2784/fr)
    INFO: ngram_search_fwdtree.c(1454): 2782066 channels searched (2192/fr),
    481350 1st, 229567 last INFO: ngram_search_fwdtree.c(1458): 26590 words for
    which last channels evaluated (20/fr) INFO: ngram_search_fwdtree.c(1461):
    186157 candidate words for entering last phone (146/fr) INFO:
    ngram_search_fwdflat.c(840): 9996 words recognized (8/fr) INFO:
    ngram_search_fwdflat.c(842): 483525 senones evaluated (381/fr) INFO:
    ngram_search_fwdflat.c(844): 377014 channels searched (297/fr) INFO:
    ngram_search_fwdflat.c(846): 27906 words searched (21/fr) INFO:
    ngram_search_fwdflat.c(848): 12010 word transitions (9/fr) WARNING:
    "ngram_search.c", line 1000: not found in last frame, using ++NOISE++
    instead INFO: ngram_search.c(1046): lattice start node .0 end node
    ++NOISE++.1204 INFO: ps_lattice.c(1225): Normalizer P(O) =
    alpha(++NOISE++:1204:1267) = -11055995 INFO: ps_lattice.c(1263): Joint P(O,S)
    = -11147882 P(S|O) = -91887
    ============================================================ I was expecting
    the get_hyp() method will spit out the text but I see some ++Noise++ statement
    and then no output. Does anybody have any ideas what I am doing wrong? Please
    let me know. Thanks sganguly@yahoo.com
    </sil>

     
  • Nickolay V. Shmyrev

    You should record at 16 khz, not at 44khz. See how dhd records in his video.

     

Log in to post a comment.