I have installed pocketsphinx 0.6.1 and same version of sphinxbase and now
have a simple python app that reads a wav file and tries to retrieve the text
from the speech and get segmentation faults. I have attached the dump from the
run with the segmentation fault.
Can somebody help me understand what am I doing wrong here? I see the fault
happen on decoder.get_hyp() call in my python app.
any help would be seriously appreciated.
Anybody have experience getting pocketsphinx 0.6.1 working with wav files?
Obviously it's working otherwise it will not be released
Can somebody help me understand what am I doing wrong here? I see the fault
happen on decoder.get_hyp() call in my python app.
Your source file has incorrect format. It must be 16khz 16bit mono wav file.
The crash was caused by incorrect input and was fixed already in svn trunk.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried another file which is not a 16bit mono wav file and see the same
crash. Also this time I checked out the latest svn source a few minutes ago
and rebuilt the system. Clearly I have something incorrect
Appreciate your help in guiding me through the process.
Thx
SG
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I see the same segmentation fault when I use the
'./pocketsphinx/test/data/wsj/n800_440c0202.wav' and
'./pocketsphinx/swig/edu/cmu/pocketsphinx/goforward.wav' file. This is the
latest code I checked out from the SVN tree for pcoketsphinx and sphinxbase.
What else do I need to use? I checked sphinxbase and pocketsphinx from the SVN
tree, compiled sphinxbase (make, make check and make install) and did the same
for pocketsphinx. No complains from the compilation or build process but seem
to see a segmentatin fault at the same place in the 'ngram_search.c'
===============================
NFO: ngram_search_fwdtree.c(1513): 1 words recognized (0/fr) INFO:
ngram_search_fwdtree.c(1515): 6 senones evaluated (1/fr) INFO:
ngram_search_fwdtree.c(1517): 3 channels searched (0/fr), 0 1st, 3 last INFO:
ngram_search_fwdtree.c(1521): 3 words for which last channels evaluated (0/fr)
INFO: ngram_search_fwdtree.c(1524): 0 candidate words for entering last phone
(0/fr) INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 0
words INFO: ngram_search_fwdflat.c(912): 3 words recognized (1/fr) INFO:
ngram_search_fwdflat.c(914): 9 senones evaluated (2/fr) INFO:
ngram_search_fwdflat.c(916): 3 channels searched (0/fr) INFO:
ngram_search_fwdflat.c(918): 3 words searched (0/fr) INFO:
ngram_search_fwdflat.c(920): 0 word transitions (0/fr) WARNING:
"ngram_search.c", line 1087: not found in last frame, using instead
INFO: ngram_search.c(1137): lattice start node .0 end node .0 INFO:
ps_lattice.c(1228): Normalizer P(O) = alpha(:0:2) = -536897689 Segmentation
fault
===============================
There seems to be something wrong within my Ubuntu node or something that I am
missing from the setup of the software.
Anybody has seen this problem?
Thanks
SG
sukantag@gmail.com
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You can try to run ps_test.py from the folder pocketsphinx/python/ to check
the installation. It might be the issue with old python modules installed in
your system before.
I see the same segmentation fault when I use the
'./pocketsphinx/test/data/wsj/n800_440c0202.wav'
This file has wrong format (8khz) between.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried cleaning up all the old python sphinbase and pocketsphinx and
rebuilding the whole thing. I have also tried running the ps_test.py which
works, but still have the same segmentation fault.
I would sincerely appreciate help in getting this to whole on my Ubuntu Linux
machine. I have been struggling for the last week on this.
Or is their an image (compiled and Linux binary that I can use) which would
work for me.
Thanks
sukantag@gmail.com
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Nikolay,
Thanks for your note. I had to clean up some of the old sphinxbase on my
Ubuntu machine and I have this resolved to some extentent but am still
struggling with very bad. I see that the wav file decoding has less than 10%
accuracy. in some of my local recordings I have 0% match on the words.
self.decoder.decode_raw(f)
f.close()
fly_str = self.decoder.get_hyp()
print " Text from the wav file -> ", fly_str
if name == "main":
for arg_in in sys.argv:
print ' File -> ' + arg_in
wa = WavAnalyze(arg_in)
wa.wav_decode()
print "Analyzed "+ arg_in +" wav file \n"
=====================
I even tried the file goforward.wav file and that too showed a bad decode
'python jana.py goforward.wav'
output was
===============================
Text from the wav file -> ('go forward and users', '000000000', -30226595)
===============================
The output should have been ('go forward ten meters')
I feel the acoustic model, language model and the dictionary are not correct
or I may have to do either some training or adaptation, but the documentation
says that the normal and simple english recordings can be done without any
trainning or adaptation.
Please let me know where I am going wrong.
Thanks
SG
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
output was =============================== Text from the wav file -> ('go
forward and users', '000000000', -30226595) ===============================
The output should have been ('go forward ten meters')
It works correclty. Output will never be "go forward ten meters" because the
word "meters" is not in the vocabulary of the language model you are using.
I feel the acoustic model, language model and the dictionary are not correct
or I may have to do either some training or adaptation, but the documentation
says that the normal and simple english recordings can be done without any
trainning or adaptation. Please let me know where I am going wrong.
Your language model still need to describe the speech you are trying to
decode. If you are interested in big language model you can find it here:
I have installed pocketsphinx 0.6.1 and same version of sphinxbase and now
have a simple python app that reads a wav file and tries to retrieve the text
from the speech and get segmentation faults. I have attached the dump from the
run with the segmentation fault.
Can somebody help me understand what am I doing wrong here? I see the fault
happen on decoder.get_hyp() call in my python app.
any help would be seriously appreciated.
==========================================================
sganguly@ubuntu:~/work/sphinx_speech/JaNaOo$ python jana.py
/home/sganguly/work/sphinx_speech/JaNaOo/sgr1.wav
INFO: cmd_ln.c(512): Parsing command line:
\
-hmm /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k \
-lm /usr/local/share/pocketsphinx/model/lm/en_US/hub4.5000.DMP \
-dict /usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict /usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm /usr/local/share/pocketsphinx/model/lm/en_US/hub4.5000.DMP
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: cmd_ln.c(512): Parsing command line:
\
-nfilt 20 \
-lowerf 1 \
-upperf 4000 \
-wlen 0.025 \
-transform dct \
-round_filters no \
-remove_dc yes \
-svspec 0-12/13-25/26-38 \
-feat 1s_c_d_dd \
-agc none \
-cmn current \
-cmninit 56,-3,1 \
-varnorm no
Current configuration:
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ceplen 13 13
-cmn current current
-cmninit 8.0 56,-3,1
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 0
-logspec no no
-lowerf 133.33334 1.000000e+00
-ncep 13 13
-nfft 512 512
-nfilt 40 20
-remove_dc no yes
-round_filters yes no
-samprate 16000 1.600000e+04
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 4.000000e+03
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.500000e-02
INFO: acmod.c(238): Parsed model-specific feature parameters from
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
INFO: feat.c(848): Initializing feature stream to type: '1s_c_d_dd',
ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean= 12.00, mean= 0.0
INFO: acmod.c(163): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(520): Reading model definition:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef
file
INFO: bin_mdef.c(330): Reading binary model definition:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: bin_mdef.c(508): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150
CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/local/sha
re/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(117): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size
256x13 256x13 256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size
256x13 256x13 256x13
INFO: ms_gauden.c(356): 0 variance values floored
INFO: s2_semi_mgau.c(897): Loading senones from dump file
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(921): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1016): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1293): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(294): Allocating 137542 * 20 bytes (2686 KiB) for word entries
INFO: dict.c(306): Reading main dictionary:
/usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic
INFO: dict.c(206): Allocated 1010 KiB for strings, 1664 KiB for phones
INFO: dict.c(309): 133436 words read
INFO: dict.c(314): Reading filler dictionary:
/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
INFO: dict.c(206): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(317): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(405): Allocating 50^3 * 2 bytes (244 KiB) for word-initial
triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word
triphones
ERROR: "ngram_model_arpa.c", line 76: No \data\ mark in LM file
INFO: ngram_model_dmp.c(141): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(195): ngrams 1=5001, 2=436879, 3=418286
INFO: ngram_model_dmp.c(241): 5001 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(289): 436879 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(314): 418286 = LM.trigrams read
INFO: ngram_model_dmp.c(338): 37293 = LM.prob2 entries read
INFO: ngram_model_dmp.c(357): 14370 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(377): 36094 = LM.prob3 entries read
INFO: ngram_model_dmp.c(405): 854 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(461): 5001 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 788 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 60 single-
phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 60
single-phone words
INFO: ngram_search_fwdtree.c(324): after: max nonroot chan increased to 13428
INFO: ngram_search_fwdtree.c(333): after: 457 root, 13300 non-root channels,
26 single-phone words
INFO: ngram_search_fwdflat.c(153): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: cmn.c(175): CMN: 81.06 -5.70 -2.05 -0.78 0.03 -0.26 -0.31 -0.34 0.35
0.38 0.54 0.37 0.26
INFO: ngram_search_fwdtree.c(1513): 1 words recognized (0/fr)
INFO: ngram_search_fwdtree.c(1515): 6 senones evaluated (1/fr)
INFO: ngram_search_fwdtree.c(1517): 3 channels searched (0/fr), 0 1st, 3 last
INFO: ngram_search_fwdtree.c(1521): 3 words for which last channels evaluated
(0/fr)
INFO: ngram_search_fwdtree.c(1524): 0 candidate words for entering last phone
(0/fr)
INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 0 words
INFO: ngram_search_fwdflat.c(912): 3 words recognized (1/fr)
INFO: ngram_search_fwdflat.c(914): 9 senones evaluated (2/fr)
INFO: ngram_search_fwdflat.c(916): 3 channels searched (0/fr)
INFO: ngram_search_fwdflat.c(918): 3 words searched (0/fr)
INFO: ngram_search_fwdflat.c(920): 0 word transitions (0/fr)
WARNING: "ngram_search.c", line 1087: not found in last frame, using
instead
INFO: ngram_search.c(1137): lattice start node
.0 end node.0INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(
:0:2) = -536897689Segmentation fault
==========================================================Hi,
Anybody have experience getting pocketsphinx 0.6.1 working with wav files?
Thx
SG
Obviously it's working otherwise it will not be released
Your source file has incorrect format. It must be 16khz 16bit mono wav file.
The crash was caused by incorrect input and was fixed already in svn trunk.
I tried another file which is not a 16bit mono wav file and see the same
crash. Also this time I checked out the latest svn source a few minutes ago
and rebuilt the system. Clearly I have something incorrect
Appreciate your help in guiding me through the process.
Thx
SG
I see the same segmentation fault when I use the
'./pocketsphinx/test/data/wsj/n800_440c0202.wav' and
'./pocketsphinx/swig/edu/cmu/pocketsphinx/goforward.wav' file. This is the
latest code I checked out from the SVN tree for pcoketsphinx and sphinxbase.
What else do I need to use? I checked sphinxbase and pocketsphinx from the SVN
tree, compiled sphinxbase (make, make check and make install) and did the same
for pocketsphinx. No complains from the compilation or build process but seem
to see a segmentatin fault at the same place in the 'ngram_search.c'
===============================
NFO: ngram_search_fwdtree.c(1513): 1 words recognized (0/fr) INFO:
ngram_search_fwdtree.c(1515): 6 senones evaluated (1/fr) INFO:
ngram_search_fwdtree.c(1517): 3 channels searched (0/fr), 0 1st, 3 last INFO:
ngram_search_fwdtree.c(1521): 3 words for which last channels evaluated (0/fr)
INFO: ngram_search_fwdtree.c(1524): 0 candidate words for entering last phone
(0/fr) INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 0
words INFO: ngram_search_fwdflat.c(912): 3 words recognized (1/fr) INFO:
ngram_search_fwdflat.c(914): 9 senones evaluated (2/fr) INFO:
ngram_search_fwdflat.c(916): 3 channels searched (0/fr) INFO:
ngram_search_fwdflat.c(918): 3 words searched (0/fr) INFO:
ngram_search_fwdflat.c(920): 0 word transitions (0/fr) WARNING:
"ngram_search.c", line 1087: not found in last frame, using
insteadINFO: ngram_search.c(1137): lattice start node
.0 end node.0 INFO:ps_lattice.c(1228): Normalizer P(O) = alpha(
:0:2) = -536897689 Segmentationfault
===============================
There seems to be something wrong within my Ubuntu node or something that I am
missing from the setup of the software.
Anybody has seen this problem?
ThanksSG
sukantag@gmail.com
You can try to run ps_test.py from the folder pocketsphinx/python/ to check
the installation. It might be the issue with old python modules installed in
your system before.
This file has wrong format (8khz) between.
I tried cleaning up all the old python sphinbase and pocketsphinx and
rebuilding the whole thing. I have also tried running the ps_test.py which
works, but still have the same segmentation fault.
I would sincerely appreciate help in getting this to whole on my Ubuntu Linux
machine. I have been struggling for the last week on this.
Or is their an image (compiled and Linux binary that I can use) which would
work for me.
Thanks
sukantag@gmail.com
if ps_test.py works, everything is ok with your installation. As usual there
is nothing wrong with Ubuntu and Linux.
You need to provide more information on your jana.py and the file you are
trying to decode. You need to provide a way to reproduce your problems.
Nikolay,
Thanks for your note. I had to clean up some of the old sphinxbase on my
Ubuntu machine and I have this resolved to some extentent but am still
struggling with very bad. I see that the wav file decoding has less than 10%
accuracy. in some of my local recordings I have 0% match on the words.
BTW, here is my code
=====================
import sys
import os
import pocketsphinx as ps
class WavAnalyze(object):
def init(self, wav_filename):
self.l_wav_filename = wav_filename
self.decoder = None
def wav_decode(self):
Initialize the speech decoder
self.decoder = ps.Decoder(hmm='/home/sg/work/sphinx_speech/pocketsphinx/model/
hmm/en_US/hub4wsj_sc_8k',
lm='/home/sg/work/sphinx_speech/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP',
dict='/home/sg/work/sphinx_speech/pocketsphinx/model/lm/en_US/cmu07a.dic')
f = file(self.l_wav_filename, 'rb')
f.seek(44)
self.decoder.decode_raw(f)
f.close()
fly_str = self.decoder.get_hyp()
print " Text from the wav file -> ", fly_str
if name == "main":
for arg_in in sys.argv:
print ' File -> ' + arg_in
wa = WavAnalyze(arg_in)
wa.wav_decode()
print "Analyzed "+ arg_in +" wav file \n"
=====================
I even tried the file goforward.wav file and that too showed a bad decode
'python jana.py goforward.wav'
output was
===============================
Text from the wav file -> ('go forward and users', '000000000', -30226595)
===============================
The output should have been ('go forward ten meters')
I feel the acoustic model, language model and the dictionary are not correct
or I may have to do either some training or adaptation, but the documentation
says that the normal and simple english recordings can be done without any
trainning or adaptation.
Please let me know where I am going wrong.
Thanks
SG
It works correclty. Output will never be "go forward ten meters" because the
word "meters" is not in the vocabulary of the language model you are using.
Your language model still need to describe the speech you are trying to
decode. If you are interested in big language model you can find it here:
http://www.keithv.com/software/csr/