Hello! First I want to thank you all for any inputs and tips! I attached my code in the end and I have a couple questions and issues here:
I am trying to write a python program that will turn on LED upon my words on Raspberry Pi 3.. I want to add specific activation words and then execute one of the TWO commands from my super tiny language model. After much research, I realized that I have to use "keyword" mode on decoder to activate listening on my Rpi (eg. "hello pi"). THEN I switch decoder to 'lm' mode to decode for (1 out of the 2 possible) command I have for it. (turn on led /turn off led) Is this understanding correct? What is difference between set_keyphrase and set_kws? Can set_kws work on a phrase("ok pi")?
I have been able to successfully record and decode voice from terminal with my USB microphone with the following command:
-arecord -f cd -c 1 -D plughw:0,0 -r 16k test.wav
I know my USB microphone is default to be recording at 44100Hz samprate but this way I manually changed it to 16k. However, when I configure pyaudio.open (rate = 16000), it gives me an error says sample rate error. When i switch rate = 44100 (usb default rate) then no error occurs.. Can I somehow make my USB microphone sample at lower than default rate? If not, is there any USB microphone you guys have used that has default sample rate at 16k Hz?
Even using the default sound card on Pi (rate = 16000), within 2 seconds of running program, it gives me an overflowing error:
I have already been using a 3 sentence language model generated online anyone has any inputs on this? Running my USB microphone at 44100Hz sample rate is also making the overflowed issue worse...
My code:
importsys,os,timefrompocketsphinx.pocketsphinximport*fromsphinxbase.sphinxbaseimport*#initialize decoder configurationconfig=Decoder.default_config()config.set_string('-hmm','/home/pi/pocketsphinx-5prealpha/model/en-us/en-us')config.set_string('-dict','/home/pi/pocketsphinx-5prealpha/8000.dic')#set up decoder search mode with defined language modeldecoder=Decoder(config)#lm = NGramModel(config, decoder.get_logmath(), path.join(modeldir, '8000.lm'))#decoder.set_lm('8000', lm)decoder.set_lm_file('lm','/home/pi/pocketsphinx-5prealpha/8000.lm')decoder.set_kws('keyword','keyword.list')decoder.set_search('keyword')#PyAudio set upimportpyaudiop=pyaudio.PyAudio()stream=p.open(format=pyaudio.paInt16,channels=1,rate=16000,#rate = 16000input=True,frames_per_buffer=1024,input_device_index=3)stream.start_stream()#silence to speech/speech to silence indicator utterance_started=False#decode input speechdecoder.start_utt()whileTrue:#print 'running'buf=stream.read(1024)ifbuf:decoder.process_raw(buf,False,False)ifdecoder.get_in_speech()!=utterance_started:utterance_started=decoder.get_in_speech()ifnotutterance_started:decoder.end_utt()print'status decoder.get_search():',decoder.get_search()try:ifdecoder.hyp().hypstr!=None:print'hypothesis:',decoder.hyp().hypstrexceptAttributeError:passtime.sleep(1)ifdecoder.get_search()=='keyword':decoder.set_search('lm')else:decoder.set_search('keyword')decoder.start_utt()else:breakdecoder.end_utt()
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you so much for such fast reply! I will look into running prerecorded file. One question: I understand that since my phrases are simple I can just directly use keyword spotting.. but I really need to have an activation first and THEN commands so that i can prevent any false positive commands.. Is there any way i can do that with pocket sphinx? Thank you again for all the time, help, and consideration!!
Last edit: Jing Yu 2017-03-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So I have 2 findings after running the example codes in swig:
I was running kws_test.py - successful
1. but as soon as I change the ".dic" from us language dict to my own generated .dic (still contains "go", "forward" and "metes") , the program does not recognize anything any more.
2.
2. I then restore the example code to its original code. After making sure it runs, I commented the stream = open(os.path.join(datadir, "goforward.raw"), "rb") and uncommented what was originally in the example file with ONE change - I added input_device_index=5 to the p.open line:
When I run the program, it errored with the following:
Traceback: return pa.read_stream(self._stream, num_frames)
IOError: [Errno Input overflowed] -9981
Please advise why these two issues are happening... I feel like I have drained all the online resources so I am kind of desparate right now... I really really appreciate your selfless time and consideration!!!!
Last edit: Jing Yu 2017-03-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1) I tested on this file:https://github.com/cmusphinx/pocketsphinx/blob/master/swig/python/test/kws_test.py, using standard files (en-us) also from github.
2) I use Rpi 3 on Rasbian Jessie
3) I am not exactly sure how to get the log file after some search on internet so I just run "python kws_test.py" in cmd line and saw the error output. I had a lot of trouble configuring my usb sound card before so I uninstalled pulse audio and made changes to ~/.asoundrc to have my usb card on 0 and default pcm on 1.
ALSA lib pcm_dsnoop.c:618:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pulse.c:243:(pulse_connect) PulseAudio: Unable to connect: Connection refused
ALSA lib pulse.c:243:(pulse_connect) PulseAudio: Unable to connect: Connection refused
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
Segmentation fault
4) I was running this example file from github: file:https://github.com/cmusphinx/pocketsphinx/blob/master/swig/python/test/kws_test.py
Ps: many thanks for all the wonderful examples you wrote!!!!
Please let me know if there is anything I can provide. I really appreciate your help!!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I ran the codes with pre-recorded audio and had no problem.. its the Pyaudio that is my pain point.. Even just a simple record-playback python code using pyaudio gives me "overflowed" error code. Below is my code:
importsys,os,timefrompocketsphinx.pocketsphinximport*fromsphinxbase.sphinxbaseimport*#initialize decoder configurationconfig=Decoder.default_config()config.set_string('-hmm','/home/pi/pocketsphinx-5prealpha/model/en-us/en-us')config.set_string('-dict','/home/pi/pocketsphinx-5prealpha/8000.dic')config.set_float('-samprate',44100.0)config.set_int('-nfft',2048)#set up decoder search mode with defined language modeldecoder=Decoder(config)#lm = NGramModel(config, decoder.get_logmath(), path.join(modeldir, '8000.lm'))#decoder.set_lm('8000', lm)decoder.set_lm_file('lm','/home/pi/pocketsphinx-5prealpha/8000.lm')decoder.set_kws('keyword','keyword.list')decoder.set_search('keyword')#PyAudio set upimportpyaudiop=pyaudio.PyAudio()stream=p.open(format=pyaudio.paInt16,channels=1,rate=44100,#rate = 16000input=True,input_device_index=5,frames_per_buffer=8192)stream.start_stream()#silence to speech/speech to silence indicator utterance_started=False#decode input speechdecoder.start_utt()whileTrue:#print 'running'buf=stream.read(8198)ifbuf:decoder.process_raw(buf,False,False)ifdecoder.get_in_speech()!=utterance_started:utterance_started=decoder.get_in_speech()ifnotutterance_started:decoder.end_utt()print'status decoder.get_search():',decoder.get_search()try:ifdecoder.hyp().hypstr!=None:print'hypothesis:',decoder.hyp().hypstrexceptAttributeError:passifdecoder.get_search()=='keyword':decoder.set_search('lm')print'-----set to lm mode now'else:decoder.set_search('keyword')print'-----set to keyword mode now'decoder.start_utt()else:breakdecoder.end_utt()
The error output looks like this:
Starting program: /usr/bin/python kw_to_grammar.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
INFO: pocketsphinx.c(152): Parsed model-specific feature parameters from /home/pi/pocketsphinx-5prealpha/model/en-us/en-us/feat.params
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-ceplen 13 13
-cmn live batch
-cmninit 40,3,-1 41.00,-5.29,-0.12,5.09,2.48,-4.07,-1.37,-1.78,-5.08,-2.05,-6.45,-1.42,1.17
-compallsen no no
-debug 0
-dict /home/pi/pocketsphinx-5prealpha/8000.dic
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm /home/pi/pocketsphinx-5prealpha/model/en-us/en-us
-input_endian little little
-jsgf
-keyphrase
-kws
-kws_delay 10 10
-kws_plp 1e-1 1.000000e-01
-kws_threshold 1 1.000000e+00
-latsize 5000 5000
-lda
-ldadim 0 0
-lifter 0 22
-lm
-lmctl
-lmname
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.300000e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf 30000 30000
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 2048
-nfilt 40 25
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-10 1.000000e-10
-pl_pip 1.0 1.000000e+00
-pl_weight 3.0 3.000000e+00
-pl_window 5 5
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 4.410000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec 0-12/13-25/26-38
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+03
-uw 1.0 1.000000e+00
-vad_postspeech 50 50
-vad_prespeech 20 20
-vad_startspeech 10 10
-vad_threshold 2.0 2.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: acmod.c(162): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: /home/pi/pocketsphinx-5prealpha/model/en-us/en-us/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /home/pi/pocketsphinx-5prealpha/model/en-us/en-us/mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(149): Reading HMM transition probability matrices: /home/pi/pocketsphinx-5prealpha/model/en-us/en-us/transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/pi/pocketsphinx-5prealpha/model/en-us/en-us/means
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: /home/pi/pocketsphinx-5prealpha/model/en-us/en-us/variances
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(244): 128x13
INFO: ms_gauden.c(304): 222 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file /home/pi/pocketsphinx-5prealpha/model/en-us/en-us/sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(838): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 4107 * 20 bytes (80 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /home/pi/pocketsphinx-5prealpha/8000.dic
INFO: dict.c(213): Dictionary size 6, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(336): 6 words read
INFO: dict.c(358): Reading filler dictionary: /home/pi/pocketsphinx-5prealpha/model/en-us/en-us/noisedict
INFO: dict.c(213): Dictionary size 11, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(365): Header doesn't match
INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
INFO: ngram_model_trie.c(193): LM of order 3
INFO: ngram_model_trie.c(195): #1-grams: 7
INFO: ngram_model_trie.c(195): #2-grams: 8
INFO: ngram_model_trie.c(195): #3-grams: 6
INFO: lm_trie.c(474): Training quantizer
INFO: lm_trie.c(482): Building LM trie
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 5 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 143
INFO: ngram_search_fwdtree.c(333): Created 5 root, 15 non-root channels, 5 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: kws_search.c(406): KWS(beam: -1080, plp: -23, default threshold 0, delay 10)
ALSA lib pcm_dsnoop.c:618:(snd_pcm_dsnoop_open) unable to open slave
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
[New Thread 0x7188d460 (LWP 1195)]
ALSA lib pulse.c:243:(pulse_connect) PulseAudio: Unable to connect: Connection refused
[Thread 0x7188d460 (LWP 1195) exited]
[New Thread 0x7188d460 (LWP 1196)]
ALSA lib pulse.c:243:(pulse_connect) PulseAudio: Unable to connect: Connection refused
[Thread 0x7188d460 (LWP 1196) exited]
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
[New Thread 0x7588e460 (LWP 1197)]
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
[Thread 0x7588e460 (LWP 1197) exited]
Program received signal SIGSEGV, Segmentation fault.
0x76fae11c in ?? () from /usr/lib/python2.7/dist-packages/_portaudio.so
I read through the error and it seems that pyaudio can not open my usb microphone.. I have already set frame_per_buffer to be a lot bigger but it still overflows... should I set it higher?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If this is helpful, in python shell the error returns:
Traceback (most recent call last):
File "/home/pi/kw_to_grammar.py", line 46, in <module>
decoder.end_utt()
File "/usr/local/lib/python2.7/dist-packages/pocketsphinx/pocketsphinx.py", line 321, in end_utt
return _pocketsphinx.Decoder_end_utt(self)
RuntimeError: Decoder_end_utt returned -1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am so sorry about the miscomminucation. I tried to use gdb backtrace but it returned "no stack.". I searched online to see if I can resolve this issue but nobody seemed to be talking about no stack.. I then proceeded to try the python debugger pdb and went through my program line by line but the output seemed to be the same as the one I posted. I apologize that I could not offer more detailed info as you asked... I understand this is very difficult so I am very thankful for your time! If you have any comments I am more than willing to learn and try. Thank you!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for all the help and patience. I have resolved all the pyaudio related issues and now have no errors and can record things smoothly. The only problem I have encountered is that when i try to configure decoder per your recommandation before:
config.set_float('-samprate', 44100.0)
it gives me a RuntimeError error for new_decoder return -1.
Once I commented out this line everything works perfectly. However the recognition is really bad thats why I want to match samprates... Any thoughts? Maybe this line does not work for pocketsphin-5prealpha version?
BIG THANKS FOR HELPING ME GET THIS FAR!!!!
Last edit: Jing Yu 2017-04-03
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hey guys. Im trying to replicate the above code with the keyword 'Hello Computer'. Changed the dic paths appropriately, but only have 1 keyphrase, which is in my dictionary (1489.dic). Just to be sure, the line decoder.set_kws('keyword', 'keyword.list') where keyword.list is a path to a file containing keywords that is foud in my home directory, and 'keyword' is a predfined arg for set_kws and not where i should put 'hello computer'. ?
Secondly, here is my error output:
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
Cannot connect to server socket err = No such file or directory
Cannot connect to server request channel
jack server is not running or cannot be started
INFO: cmn_live.c(120): Update from < 41.00 -5.29 -0.12 5.09 2.48 -4.07 -1.37 -1.78 -5.08 -2.05 -6.45 -1.42 1.17 >
INFO: cmn_live.c(138): Update to < 27.74 14.31 12.36 -1.54 1.00 -0.99 -2.98 -1.79 0.55 -0.95 -1.67 1.60 1.33 >
INFO: kws_search.c(656): kws 0.30 CPU 0.357 xRT
INFO: kws_search.c(658): kws 0.98 wall 1.170 xRT
Result:
Traceback (most recent call last):
File "key-gram-switch.py", line 41, in <module>
print 'Result:', decoder.hyp().hypstr
AttributeError: 'NoneType' object has no attribute 'hypstr'
INFO: kws_search.c(448): TOTAL kws 0.00 CPU nan xRT
INFO: kws_search.c(451): TOTAL kws 0.00 wall nan xRT
INFO: ngram_search_fwdtree.c(429): TOTAL fwdtree 0.00 CPU nan xRT
INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 0.00 wall nan xRT
INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.00 CPU nan xRT
INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.00 wall nan xRT
INFO: ngram_search.c(303): TOTAL bestpath 0.00 CPU nan xRT
INFO: ngram_search.c(306): TOTAL bestpath 0.00 wall nan xRT
INFO: kws_search.c(448): TOTAL kws 0.30 CPU 0.361 xRT
INFO: kws_search.c(451): TOTAL kws 0.98 wall 1.184 xRT
my microphone is fully functional as well.
Last edit: Bari Tala 2017-04-09
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
#!/usr/bin/envimportsys,osimportpyaudiofrompocketsphinx.pocketsphinximport*fromsphinxbase.sphinxbaseimport*MODELDIR="/home/pi/pocketsphinx-5prealpha/model"datadir="/home/pi/"#Init decoderconfig=Decoder.default_config()config.set_string('-hmm',os.path.join(MODELDIR,'en-us/en-us'))config.set_string('-dict',os.path.join(MODELDIR,'1489.dic'))config.set_float('-kws_threshold',1e-20)decoder=Decoder(config)# Add searches#decoder.set_kws('keyword', '/home/pi/keyword.list')decoder.set_keyphrase("keyword","HELLO COMPUTER")decoder.set_lm_file('lm','/home/pi/pocketsphinx-5prealpha/model/1489.lm')decoder.set_search('keyword')p=pyaudio.PyAudio()stream=p.open(format=pyaudio.paInt16,channels=1,rate=16000,input=True,frames_per_buffer=1024)stream.start_stream()in_speech_bf=Falsedecoder.start_utt()whileTrue:buf=stream.read(1024)ifbuf:decoder.process_raw(buf,False,False)ifdecoder.get_in_speech()!=in_speech_bf:in_speech_bf=decoder.get_in_speech()ifnotin_speech_bf:decoder.end_utt()# Print hypothesis and switch search to another modeprint'Result:',decoder.hyp().hypstrifdecoder.get_search()=='keyword':decoder.set_search('lm')else:decoder.set_search('keyword')decoder.start_utt()else:breakdecoder.end_utt()stream.end_stream()
I'm getting the error as above. But SOMETIMES when I run it and quickly say 'Hello Computer' it will print 'Hello Computer' but then it will give the same error.
Last edit: Bari Tala 2017-04-09
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hey Nickolay, thanks for the quick response. I'm trying to add an if statement as below. But still getting the same error. You're saying I need to make sure decoder.hyp() is not empty, right?
while True:
buf = stream.read(1024)
if buf:
decoder.process_raw(buf, False, False)
if decoder.get_in_speech() != in_speech_bf:
in_speech_bf = decoder.get_in_speech()
if not in_speech_bf:
decoder.end_utt()
**if type(decoder.hyp()) is not None:** #make sure not NoneType
# Print hypothesis and switch search to another mode
print 'Result:', decoder.hyp().hypstr
if decoder.get_search() == 'keyword':
decoder.set_search('lm')
else:
decoder.set_search('keyword')
decoder.start_utt()
else:
break
decoder.end_utt()
stream.end_stream()
Hey figured it out. i changed it now to:
if decoder.hyp() is not None:
After I say 'Hello Computer' then pause and say 'Down' ( or any other word in my dictionary) it again gives me a decoder.hyp() of None. It then won't throw the error anymore, but seems to switch to a continuous recognition instead of switching back to keyword requiring.
Last edit: Bari Tala 2017-04-09
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
SUCCESS!
First an explanation to those wondering whats going on and other various python newbies like myself. When listening, decoder.hyp() defaults to None if it doesn't recognize keyword, the script will terminate when you call hypstr() on the None. So, just make sure you do the keyword-to-continuous switching only when decoder.hyp() isn't none (i.e. when it has heard a keyword).
Thanks for the help. Great product and will keep you updated!
Bari
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello! First I want to thank you all for any inputs and tips! I attached my code in the end and I have a couple questions and issues here:
I am trying to write a python program that will turn on LED upon my words on Raspberry Pi 3.. I want to add specific activation words and then execute one of the TWO commands from my super tiny language model. After much research, I realized that I have to use "keyword" mode on decoder to activate listening on my Rpi (eg. "hello pi"). THEN I switch decoder to 'lm' mode to decode for (1 out of the 2 possible) command I have for it. (turn on led /turn off led) Is this understanding correct? What is difference between set_keyphrase and set_kws? Can set_kws work on a phrase("ok pi")?
I have been able to successfully record and decode voice from terminal with my USB microphone with the following command:
-arecord -f cd -c 1 -D plughw:0,0 -r 16k test.wav
I know my USB microphone is default to be recording at 44100Hz samprate but this way I manually changed it to 16k. However, when I configure pyaudio.open (rate = 16000), it gives me an error says sample rate error. When i switch rate = 44100 (usb default rate) then no error occurs.. Can I somehow make my USB microphone sample at lower than default rate? If not, is there any USB microphone you guys have used that has default sample rate at 16k Hz?
buf = stream.read(1024)
ERROR: [Error Input overflowed] -9981
I have already been using a 3 sentence language model generated online anyone has any inputs on this? Running my USB microphone at 44100Hz sample rate is also making the overflowed issue worse...
My code:
No you can use keyword spotting for commands without activation keyphrase since your commands are simple.
keyphrase is single activation keyphrase from a string, kws configures multiple keyphrases from file
Yes if you put the phrase in a file.
You can configure alsa to do resampling or you can use pulseaudio, it will do resampling automatically.
It depends on sound card not microphone. It is fine to record at 44khz, you just need to do resampling properly.
This means the software is too slow, you can debug it by running same recognition from prerecorded audio file and collecting applicaiton logs.
Thank you so much for such fast reply! I will look into running prerecorded file. One question: I understand that since my phrases are simple I can just directly use keyword spotting.. but I really need to have an activation first and THEN commands so that i can prevent any false positive commands.. Is there any way i can do that with pocket sphinx? Thank you again for all the time, help, and consideration!!
Last edit: Jing Yu 2017-03-29
It is implemented in the code above.
So I have 2 findings after running the example codes in swig:
I was running kws_test.py - successful
1. but as soon as I change the ".dic" from us language dict to my own generated .dic (still contains "go", "forward" and "metes") , the program does not recognize anything any more.
2.
2. I then restore the example code to its original code. After making sure it runs, I commented the
stream = open(os.path.join(datadir, "goforward.raw"), "rb")
and uncommented what was originally in the example file with ONE change - I added input_device_index=5 to the p.open line:When I run the program, it errored with the following:
Traceback: return pa.read_stream(self._stream, num_frames)
IOError: [Errno Input overflowed] -9981
Please advise why these two issues are happening... I feel like I have drained all the online resources so I am kind of desparate right now... I really really appreciate your selfless time and consideration!!!!
Last edit: Jing Yu 2017-03-29
To get help on this issue you need to provide:
1) All data file syou are using
2) Model of raspberry pi you have
3) Complete pocketsphinx log output.
4) Complete code you are running
1) I tested on this file:https://github.com/cmusphinx/pocketsphinx/blob/master/swig/python/test/kws_test.py, using standard files (en-us) also from github.
2) I use Rpi 3 on Rasbian Jessie
3) I am not exactly sure how to get the log file after some search on internet so I just run "python kws_test.py" in cmd line and saw the error output. I had a lot of trouble configuring my usb sound card before so I uninstalled pulse audio and made changes to ~/.asoundrc to have my usb card on 0 and default pcm on 1.
4) I was running this example file from github: file:https://github.com/cmusphinx/pocketsphinx/blob/master/swig/python/test/kws_test.py
Ps: many thanks for all the wonderful examples you wrote!!!!
Please let me know if there is anything I can provide. I really appreciate your help!!
To debug segmentation fault you need to run the python under gdb and collect backtrace when it crash:
then type
run
to run the command. When it crashes typebt
and save the output.I recommend to test decoding from a file instead of microphone first.
i guess a more important question is...how do I resample after setting pyaudio at 44.1k? Thank you!
You do not need resample. You can process 44.1 khz audio, you need to add options:
This is awesome and super helpful!!
I ran the codes with pre-recorded audio and had no problem.. its the Pyaudio that is my pain point.. Even just a simple record-playback python code using pyaudio gives me "overflowed" error code. Below is my code:
The error output looks like this:
I read through the error and it seems that pyaudio can not open my usb microphone.. I have already set frame_per_buffer to be a lot bigger but it still overflows... should I set it higher?
If this is helpful, in python shell the error returns:
I asked you to provide backtrace above. I will not ask third time.
I am so sorry about the miscomminucation. I tried to use gdb backtrace but it returned "no stack.". I searched online to see if I can resolve this issue but nobody seemed to be talking about no stack.. I then proceeded to try the python debugger pdb and went through my program line by line but the output seemed to be the same as the one I posted. I apologize that I could not offer more detailed info as you asked... I understand this is very difficult so I am very thankful for your time! If you have any comments I am more than willing to learn and try. Thank you!
To debug segmentation fault you need to run the python under gdb and collect backtrace when it crash:
then type
run
to run the command. When it crashes typebt
and save the output.See also https://wiki.mageia.org/en/Debugging_software_crashes
Hello Nickolay!
Thank you for all the help and patience. I have resolved all the pyaudio related issues and now have no errors and can record things smoothly. The only problem I have encountered is that when i try to configure decoder per your recommandation before:
config.set_float('-samprate', 44100.0)
it gives me a RuntimeError error for new_decoder return -1.
Once I commented out this line everything works perfectly. However the recognition is really bad thats why I want to match samprates... Any thoughts? Maybe this line does not work for pocketsphin-5prealpha version?
BIG THANKS FOR HELPING ME GET THIS FAR!!!!
Last edit: Jing Yu 2017-04-03
You can process 44.1 khz audio, you need to add TWO options TOGETHER:
Hey guys. Im trying to replicate the above code with the keyword 'Hello Computer'. Changed the dic paths appropriately, but only have 1 keyphrase, which is in my dictionary (1489.dic). Just to be sure, the line
decoder.set_kws('keyword', 'keyword.list')
where keyword.list is a path to a file containing keywords that is foud in my home directory, and 'keyword' is a predfined arg for set_kws and not where i should put 'hello computer'. ?Secondly, here is my error output:
my microphone is fully functional as well.
Last edit: Bari Tala 2017-04-09
Hey sorry, here is my code.
I'm getting the error as above. But SOMETIMES when I run it and quickly say 'Hello Computer' it will print 'Hello Computer' but then it will give the same error.
Last edit: Bari Tala 2017-04-09
You need to check hyp for none before accessing hypstr.
Hey Nickolay, thanks for the quick response. I'm trying to add an if statement as below. But still getting the same error. You're saying I need to make sure decoder.hyp() is not empty, right?
Hey figured it out. i changed it now to:
if decoder.hyp() is not None:
After I say 'Hello Computer' then pause and say 'Down' ( or any other word in my dictionary) it again gives me a decoder.hyp() of None. It then won't throw the error anymore, but seems to switch to a continuous recognition instead of switching back to keyword requiring.
Last edit: Bari Tala 2017-04-09
http://stackoverflow.com/questions/23086383/how-to-test-nonetype-in-python
SUCCESS!
First an explanation to those wondering whats going on and other various python newbies like myself. When listening, decoder.hyp() defaults to None if it doesn't recognize keyword, the script will terminate when you call hypstr() on the None. So, just make sure you do the keyword-to-continuous switching only when decoder.hyp() isn't none (i.e. when it has heard a keyword).
Thanks for the help. Great product and will keep you updated!
Bari
Hey Bari Tala, if you still have the code, can you share it to me. I kinda desparate
Thanks!
Last edit: ismail nur adli 2020-05-12