CMU Sphinx / Forums / Help: PocketSphinx speech recognizer not working

I have Ubuntu installed and have PocketSphinx installed. I wrote a simple
Python program that would do the same that David Huggins-Daines did in the
PyCon 2010 talk (saw the video). I have a wav file that I recorded using
audacity 44100 sample rate that I use as the file to recognize the words and
when I run the python program I see no decoded text output. Here is my program
=================== import pocketsphinx as ps import sphinxbase if
name=='main': decoder =
ps.Decoder(hmm="/usr/share/pocketsphinx/model/hmm/wsj1",
lm="/usr/share/pocketsphinx/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP",
dict="/usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic") #fh =
file('welcome.wav', 'rb') fh =
file('/home/sganguly/work/bataahoo_pocketsphinx/welcome.wav', 'rb')
fh.seek(44) decoder.decode_raw(fh) decoder.get_hyp()
=========================== Below is the output when I run my Python program
=========================== INFO: cmd_ln.c(506): Parsing command line: \ -hmm
/usr/share/pocketsphinx/model/hmm/wsj1 \ -lm
/usr/share/pocketsphinx/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP \ -dict
/usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic Current configuration: -agc
none none -agcthresh 2.0 2.000000e+00 -alpha 0.97 9.700000e-01 -ascale 20.0
2.000000e+01 -backtrace no no -beam 1e-48 1.000000e-48 -bestpath yes yes
-bestpathlw 9.5 9.500000e+00 -cep2spec no no -ceplen 13 13 -cmn current
current -cmninit 8.0 8.0 -compallsen no no -dict
/usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic -dictcase no no -dither no no
-doublebw no no -ds 1 1 -fdict -feat 1s_c_d_dd 1s_c_d_dd -featparams -fillprob
1e-8 1.000000e-08 -frate 100 100 -fsg -fsgusealtpron yes yes -fsgusefiller yes
yes -fwdflat yes yes -fwdflatbeam 1e-64 1.000000e-64 -fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00 -fwdflatsfwin 25 25 -fwdflatwbeam 7e-29
7.000000e-29 -fwdtree yes yes -hmm /usr/share/pocketsphinx/model/hmm/wsj1
-input_endian little little -jsgf -kdmaxbbi -1 -1 -kdmaxdepth 0 0 -kdtree
-latsize 5000 5000 -lda -ldadim 0 0 -lifter 0 0 -lm
/usr/share/pocketsphinx/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP -lmctl -lmname
default default -logbase 1.0001 1.000100e+00 -logfn -logspec no no -lowerf
133.33334 1.333333e+02 -lpbeam 1e-40 1.000000e-40 -lponlybeam 7e-29
7.000000e-29 -lw 6.5 6.500000e+00 -maxhistpf 100 100 -maxhmmpf -1 -1
-maxnewoov 20 20 -maxwpf -1 -1 -mdef -mean -mfclogdir -mixw -mixwfloor
0.0000001 1.000000e-07 -mmap yes yes -ncep 13 13 -nfft 512 512 -nfilt 40 40
-nwpen 1.0 1.000000e+00 -pbeam 1e-48 1.000000e-48 -pip 1.0 1.000000e+00
-rawlogdir -remove_dc no no -round_filters yes yes -samprate 16000
1.600000e+04 -sdmap -seed -1 -1 -sendump -silprob 0.005 5.000000e-03
-smoothspec no no -spec2cep no no -svspec -tmat -tmatfloor 0.0001 1.000000e-04
-topn 4 4 -toprule -transform legacy legacy -unit_area yes yes -upperf
6855.4976 6.855498e+03 -usewdphones no no -uw 1.0 1.000000e+00 -var -varfloor
0.0001 1.000000e-04 -varnorm no no -verbose no no -warp_params -warp_type
inverse_linear inverse_linear -wbeam 7e-29 7.000000e-29 -wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02 INFO: cmd_ln.c(506): Parsing command line: \
-lowerf 1 \ -upperf 4000 \ -nfilt 20 \ -transform dct \ -round_filters no \
-remove_dc yes \ -feat s2_4x Current configuration: -agc none none -agcthresh
2.0 2.000000e+00 -alpha 0.97 9.700000e-01 -cep2spec no no -ceplen 13 13 -cmn
current current -cmninit 8.0 8.0 -dither no no -doublebw no no -feat 1s_c_d_dd
s2_4x -frate 100 100 -input_endian little little -lda -ldadim 0 0 -lifter 0 0
-logfn -logspec no no -lowerf 133.33334 1.000000e+00 -mfclogdir -ncep 13 13
-nfft 512 512 -nfilt 40 20 -rawlogdir -remove_dc no yes -round_filters yes no
-samprate 16000 1.600000e+04 -seed -1 -1 -smoothspec no no -spec2cep no no
-svspec -transform legacy dct -unit_area yes yes -upperf 6855.4976
4.000000e+03 -varnorm no no -verbose no no -warp_params -warp_type
inverse_linear inverse_linear -wlen 0.025625 2.562500e-02 INFO: acmod.c(82):
Parsed model-specific feature parameters from
/usr/share/pocketsphinx/model/hmm/wsj1/feat.params INFO: mdef.c(520): Reading
model definition: /usr/share/pocketsphinx/model/hmm/wsj1/mdef INFO:
mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(301): Reading binary model definition:
/usr/share/pocketsphinx/model/hmm/wsj1/mdef INFO: bin_mdef.c(480): 44 CI-
phone, 66516 CD-phone, 5 emitstate/phone, 220 CI-sen, 5220 Sen, 18660 Sen-Seq
INFO: tmat.c(204): Reading HMM transition probability matrices:
/usr/share/pocketsphinx/model/hmm/wsj1/transition_matrices INFO: acmod.c(114):
Attempting to use SCGMM computation module INFO: s2_semi_mgau.c(981): Reading
S3 mixture gaussian file '/usr/share/pocketsphinx/model/hmm/wsj1/means' INFO:
s2_semi_mgau.c(1080): 1 mixture Gaussians, 256 components, 4 feature streams,
veclen 51 INFO: s2_semi_mgau.c(981): Reading S3 mixture gaussian file
'/usr/share/pocketsphinx/model/hmm/wsj1/variances' INFO: s2_semi_mgau.c(1080):
1 mixture Gaussians, 256 components, 4 feature streams, veclen 51 INFO:
s2_semi_mgau.c(748): Loading senones from dump file
/usr/share/pocketsphinx/model/hmm/wsj1/sendump INFO: s2_semi_mgau.c(764):
BEGIN FILE FORMAT DESCRIPTION INFO: s2_semi_mgau.c(793): Rows: 256, Columns:
5220 INFO: s2_semi_mgau.c(801): Using memory-mapped I/O for senones INFO:
kdtree.c(231): Reading tree for feature 0 INFO: kdtree.c(249): n_density 256
n_comp 12 n_level 8 threshold 0.200000 INFO: kdtree.c(186): Read 255 nodes
INFO: kdtree.c(231): Reading tree for feature 1 INFO: kdtree.c(249): n_density
256 n_comp 24 n_level 8 threshold 0.200000 INFO: kdtree.c(186): Read 255 nodes
INFO: kdtree.c(231): Reading tree for feature 2 INFO: kdtree.c(249): n_density
256 n_comp 3 n_level 8 threshold 0.200000 INFO: kdtree.c(186): Read 255 nodes
INFO: kdtree.c(231): Reading tree for feature 3 INFO: kdtree.c(249): n_density
256 n_comp 12 n_level 8 threshold 0.200000 INFO: kdtree.c(186): Read 255 nodes
INFO: feat.c(849): Initializing feature stream to type: 's2_4x', ceplen=13,
CMN='current', VARNORM='no', AGC='none' INFO: cmn.c(142): mean= 12.00, mean=
0.0 INFO: dict.c(232): Allocating 20 placeholders for new OOVs INFO:
dict.c(494): 6270 = words in file WARNING: "dict.c", line 435: Skipping
duplicate definition of WARNING: "dict.c", line 435: Skipping duplicate
definition of WARNING: "dict.c", line 435: Skipping duplicate definition
of <sil> INFO: dict.c(494): 3 = words in file INFO: dict.c(349): LEFT CONTEXT
TABLES INFO: dict.c(1013): Entry Context table contains 450 entries INFO:
dict.c(1014): 19800 possible cross word triphones. INFO: dict.c(1052): 17920
triphones 1792 pseudo diphones 88 uniphones INFO: dict.c(1099): Exit Context
table contains 450 entries INFO: dict.c(1100): 19800 possible cross word
triphones. INFO: dict.c(1166): 17920 triphones 1792 pseudo diphones 88
uniphones INFO: dict.c(1168): 7653 right context entries INFO: dict.c(1169):
17 ave entries per exit context INFO: dict.c(355): RIGHT CONTEXT TABLES INFO:
dict.c(1013): Entry Context table contains 416 entries INFO: dict.c(1014):
18304 possible cross word triphones. INFO: dict.c(1052): 17388 triphones 828
pseudo diphones 88 uniphones INFO: dict.c(1099): Exit Context table contains
416 entries INFO: dict.c(1100): 18304 possible cross word triphones. INFO:
dict.c(1166): 17388 triphones 828 pseudo diphones 88 uniphones INFO:
dict.c(1168): 8753 right context entries INFO: dict.c(1169): 21 ave entries
per exit context ERROR: "ngram_model_arpa.c", line 155: No \data\ mark in LM
file INFO: ngram_model_dmp.c(141): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(190): ngrams 1=5002, 2=338656, 3=291318 INFO:
ngram_model_dmp.c(236): 5002 = LM.unigrams(+trailer) read INFO:
ngram_model_dmp.c(286): 338656 = LM.bigrams(+trailer) read INFO:
ngram_model_dmp.c(313): 291318 = LM.trigrams read INFO:
ngram_model_dmp.c(338): 32470 = LM.prob2 entries read INFO:
ngram_model_dmp.c(358): 13795 = LM.bo_wt2 entries read INFO:
ngram_model_dmp.c(379): 31136 = LM.prob3 entries read INFO:
ngram_model_dmp.c(408): 662 = LM.tseg_base entries read INFO:
ngram_model_dmp.c(467): 5002 = ascii word strings read INFO:
ngram_search_fwdtree.c(156): 0 root, 0 non-root channels, 37 single-phone
words INFO: ngram_search_fwdtree.c(195): Creating search tree INFO:
ngram_search_fwdtree.c(203): 0 root, 0 non-root channels, 37 single-phone
words INFO: ngram_search_fwdtree.c(325): max nonroot chan increased to 13871
INFO: ngram_search_fwdtree.c(334): 443 root, 13743 non-root channels, 17
single-phone words INFO: ngram_search_fwdflat.c(95): fwdflat: min_ef_width =
4, max_sf_win = 25 INFO: cmn.c(175): CMN: 13.39 6.68 0.27 1.21 1.36 -0.24 0.70
0.95 1.02 0.27 -0.13 -0.14 -0.19 INFO: ngram_search.c(368): Resized
backpointer table to 10000 entries INFO: ngram_search.c(376): Resized score
stack to 200000 entries INFO: ngram_search.c(368): Resized backpointer table
to 20000 entries INFO: ngram_search_fwdtree.c(1450): 11828 words recognized
(9/fr) INFO: ngram_search_fwdtree.c(1452): 3533274 senones evaluated (2784/fr)
INFO: ngram_search_fwdtree.c(1454): 2782066 channels searched (2192/fr),
481350 1st, 229567 last INFO: ngram_search_fwdtree.c(1458): 26590 words for
which last channels evaluated (20/fr) INFO: ngram_search_fwdtree.c(1461):
186157 candidate words for entering last phone (146/fr) INFO:
ngram_search_fwdflat.c(840): 9996 words recognized (8/fr) INFO:
ngram_search_fwdflat.c(842): 483525 senones evaluated (381/fr) INFO:
ngram_search_fwdflat.c(844): 377014 channels searched (297/fr) INFO:
ngram_search_fwdflat.c(846): 27906 words searched (21/fr) INFO:
ngram_search_fwdflat.c(848): 12010 word transitions (9/fr) WARNING:
"ngram_search.c", line 1000: not found in last frame, using ++NOISE++
instead INFO: ngram_search.c(1046): lattice start node .0 end node
++NOISE++.1204 INFO: ps_lattice.c(1225): Normalizer P(O) =
alpha(++NOISE++:1204:1267) = -11055995 INFO: ps_lattice.c(1263): Joint P(O,S)
= -11147882 P(S|O) = -91887
============================================================ I was expecting
the get_hyp() method will spit out the text but I see some ++Noise++ statement
and then no output. Does anybody have any ideas what I am doing wrong? Please
let me know. Thanks sganguly@yahoo.com</sil>

PocketSphinx speech recognizer not working

Speech Recognition Toolkit

Forums

Help

PocketSphinx speech recognizer not working document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

PocketSphinx speech recognizer not working