CMU Sphinx / Forums / Help: Streaming Pocketsphinx using Pyaudio

Folks,

I have been mucking around with my Raspberry Pi for a while now and have setup Pocketsphinx on it. While i have been working with Open source and am well versed with Linux/Unix with a basic scripting capability. I am able to work through Python code (if it's not too complex) but am not much of a Python programmer (though I would love to pick up some python skills over the coming years).

My raspberry pi is part of a robot I've built (GoPiGo) and I am trying to work out how to control it using some simple commands. This is where I am really stuck and I would really appreciate some direction. I've looked at the forums, various examples but I can't really seem to put together a simple program in python that picks up what I am saying and translates it to text. Eventually I want to be able to run a command using the python program once a string is identified.

I am able to run Pocketsphinx by itself at the command line and it's able to pick up the words, so no issues with that. The challenge is getting it to work through Python where it's a streaming application, listening to my (mike), picking up the commands and passing it along to the robot (GoPiGo) to do stuff.

The code below is quite amateurish...feel free to bag me for any inefficient coding practices you find in there. I would really appreciate some direction on sorting this out.

from os import environ, path
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *
import pyaudio
import wave
import socket

MODELDIR = "/usr/local/share/pocketsphinx/model"
DATADIR = "/home/perf/Downloads/Python/Dev/PocketSpinx_TTS/data"

config = Decoder.default_config()
config.set_string('-adcdev', 'sysdefault')
config.set_string('-hmm', '/usr/local/share/pocketsphinx/model/en-us/en-us')
config.set_string('-lm', '/home/perf/Downloads/Python/Dev/PocketSpinx_TTS/data/9735.lm')
config.set_string('-dict', '/home/perf/Downloads/Python/Dev/PocketSpinx_TTS/data/9735.dic')
config.set_string('-samprate', '8000')
config.set_string('-inmic', 'yes')
decoder = Decoder(config)

p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=8000, input=True, frames_per_buffer=1024, input_device_index=0)
stream.start_stream()
in_speech_bf = True

decoder.start_utt()
print "Starting to listen"

while True:
 buf = stream.read(1024)
 decoder.process_raw(buf, False, False)
 if decoder.hyp() != None and decoder.hyp().hypstr == 'FORWARD':
  decoder.end_utt()
  print "Detected Move Forward, restarting search"
  decoder.start_utt()
print "Am not listening any more"

Last edit: VisualizeIT 2016-05-08

You forgot to tell what is the problem with your code. You also forgot to format it properly.

You need to use keyword spotting mode to continuously look for commands, the sample code is here:

https://github.com/cmusphinx/pocketsphinx/blob/master/swig/python/test/kws_test.py

Keyword spotting is explained in tutorial, I recommend you to read it too

http://cmusphinx.sourceforge.net/wiki/tutoriallm

Nickolay. Thank you so much for responding. Appreciate the assistance.

Ok, I figured out what was the issue with the code I submitted. I've re-submitted as code.
I've also attached the code as a text file.

#!/usr/bin/python

from os import environ, path
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *
import pyaudio
import wave
import socket
import time

#pocketsphinx_continuous -adcdev sysdefault -hmm /usr/local/share/pocketsphinx/model/en-us/en-us -lm 9735.lm -dict 9735.dic -samprate 16000 -inmic yes

MODELDIR = "/usr/local/share/pocketsphinx/model"
DATADIR = "/home/perf/Downloads/TrevorWarren/Python/Dev/PocketSpinx_TTS/data"

# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-adcdev', 'sysdefault')
#config.set_string('-adcdev', 'plughw:0,0')
config.set_string('-hmm', '/usr/local/share/pocketsphinx/model/en-us/en-us')
config.set_string('-lm', '/home/perf/Downloads/Python/Dev/PocketSpinx_TTS/data/9735.lm')
#config.set_string('-kws', '/home/perf/Downloads/Python/Dev/PocketSpinx_TTS/data/keywords')
config.set_string('-dict', '/home/perf/Downloads/Python/Dev/PocketSpinx_TTS/data/9735.dic')
config.set_string('-samprate', '8000')
config.set_string('-inmic', 'yes')
decoder = Decoder(config)

p = pyaudio.PyAudio()
#stream = p.open(format=pyaudio.paInt16, channels=2, rate=16000, input=True, frames_per_buffer=1024, input_device_index=2)
#stream = p.open(format=pyaudio.paInt16, channels=1, rate=8000, input=True, frames_per_buffer=1024, input_device_index=0)
stream = p.open(format=pyaudio.paInt16, channels=1, rate=8000, input=True, frames_per_buffer=1024, input_device_index=0)
stream.start_stream()
in_speech_bf = True

#decoder.set_lm_file("lm", '/usr/local/share/pocketsphinx/model/en-us/en-us.lm.bin')
#decoder.set_keyphrase("kws", "FORWARD")
#decoder.set_search("kws")

decoder.start_utt()
print "Starting to listen"

while True:
 buf = stream.read(1024)
 decoder.process_raw(buf, False, False)
 if decoder.hyp() != None and decoder.hyp().hypstr == 'MOVE FORWARD':
  decoder.end_utt
  print "Detected Move Forward, moving forward and restarting search"
  decoder.start_utt()
 elif decoder.hyp() != None and decoder.hyp().hypstr == 'MOVE BACKWARD':
  decoder.end_utt
  print "Detected Move Backward, moving backward and restarting search"
  decoder.start_utt()
 elif decoder.hyp() != None and decoder.hyp().hypstr == 'MOVE LEFT':
  decoder.end_utt
  print "Detected Move Left, moving left and restarting search"
  decoder.start_utt()
 elif decoder.hyp() != None and decoder.hyp().hypstr == 'MOVE RIGHT':
  decoder.end_utt
  print "Detected Move Right, moving right and restarting search"
  decoder.start_utt()
 elif decoder.hyp() != None and decoder.hyp().hypstr == 'STOP':
  decoder.end_utt
  print "Detected Stop. Stopping and restarting search"
  decoder.start_utt()
 else:
  decoder.end_utt()
  print "Nothing Detected, Restarting search"
  decoder.start_utt()
print "Am not listening any more"

The issue is that, nothing is ever recognized. The program directly jumps to the last block i.e. nothing detected, restarting search.

Any pointers would be appreciated.

Last edit: VisualizeIT 2016-05-08

sample_program_to_read_stream_from_mic.txt

Nickolay,

Thank you so much for responding. Appreciate the assistance.

I've also tried using the "keyword" spotting mode examples and the program seems to fail on the Raspberry Pi with an input overflow error. I've done a bit of digging around the forums, folks have suggested changing the buffer size which I've done (256, 512, etc) but I still get the same error.

The example I am using is -

#!/usr/bin/python

import sys, os
import pyaudio
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *

modeldir = "/usr/local/share/pocketsphinx/model"
datadir = "../../../test/data"

# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-adcdev', 'sysdefault')
config.set_string('-samprate', '8000')
config.set_string('-inmic', 'yes')
config.set_string('-hmm', os.path.join(modeldir, 'en-us/en-us'))
config.set_string('-dict', os.path.join(modeldir, 'en-us/cmudict-en-us.dict'))
config.set_string('-keyphrase', 'forward')
config.set_float('-kws_threshold', 1e+20)

# Open file to read the data
#stream = open(os.path.join(datadir, "goforward.raw"), "rb")

# Alternatively you can read from microphone
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=8000, input=True, output=True, frames_per_buffer=1024)
stream.start_stream()

# Process audio chunk by chunk. On keyword detected perform action and restart search
decoder = Decoder(config)
decoder.start_utt()
while True:
 buf = stream.read(1024)
 if buf:
  decoder.process_raw(buf, False, False)
 else:
  break
 if decoder.hyp() != None:
  print ([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()])
  print ("Detected keyword, restarting search")
  decoder.end_utt()
  decoder.start_utt()

The error it fails with is as follows -

INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/variances
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(354): 222 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file /usr/local/share/pocketsphinx/model/en-us/en-us/sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(835): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 138623 * 20 bytes (2707 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
INFO: dict.c(213): Allocated 1014 KiB for strings, 1677 KiB for phones
INFO: dict.c(336): 134522 words read
INFO: dict.c(358): Reading filler dictionary: /usr/local/share/pocketsphinx/model/en-us/en-us/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word triphones
INFO: kws_search.c(423): KWS(beam: -1080, plp: -23, default threshold 449, delay 10)
Traceback (most recent call last):
  File "./temp4.py", line 34, in <module>
    buf = stream.read(1024)
  File "/usr/lib/python2.7/dist-packages/pyaudio.py", line 605, in read
    return pa.read_stream(self._stream, num_frames)
IOError: [Errno Input overflowed] -9981

Last edit: VisualizeIT 2016-05-08

Its a PyAudio audio recording issue. See link below for pyaudio Record example .
https://people.csail.mit.edu/hubert/pyaudio/

Error indicate, your Mic is not supporting Sample Rate 8000. You need to set a default plug plugin in ~/.asoundrc (Rasbian)

something like this:

pcm.!default {
type plug
slave {
pcm "hw:0,0"
}
}

ctl.!default {
type hw
card 0
}

Thanks for that G10DRAS. I've tried the config you've mentioned above but the errors prevails.

Here's the output from /proc/asound/card0/stream0

USB Device 0x46d:0x8d9 at usb-3f980000.usb-1.5, full speed : USB Audio

Capture:
  Status: Stop
  Interface 2
    Altset 1
    Format: S16_LE
    Channels: 1
    Endpoint: 3 IN (NONE)
    Rates: 8000

Here's the error from the python program (the keyword detection example I've modified. Code provided above).

perf@warrenraspberrypi:~/Downloads/TrevorWarren/Python/Dev/PocketSpinx_TTS $ sudo ./temp4.py
ERROR: "cmd_ln.c", line 938: Unknown argument: -adcdev
ERROR: "cmd_ln.c", line 990: Unknown argument: -adcdev
ERROR: "cmd_ln.c", line 938: Unknown argument: -inmic
ERROR: "cmd_ln.c", line 990: Unknown argument: -inmic
ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pulse.c:243:(pulse_connect) PulseAudio: Unable to connect: Connection refused

ALSA lib pulse.c:243:(pulse_connect) PulseAudio: Unable to connect: Connection refused

ALSA lib pcm_dmix.c:1022:(snd_pcm_dmix_open) unable to open slave
INFO: pocketsphinx.c(145): Parsed model-specific feature parameters from /usr/local/share/pocketsphinx/model/en-us/en-us/feat.params
Current configuration:
[NAME]                  [DEFLT]         [VALUE]
-agc                    none            none
-agcthresh              2.0             2.000000e+00
-allphone
-allphone_ci            no              no
-alpha                  0.97            9.700000e-01
-ascale                 20.0            2.000000e+01
-aw                     1               1
-backtrace              no              no
-beam                   1e-48           1.000000e-48
-bestpath               yes             yes
-bestpathlw             9.5             9.500000e+00
-ceplen                 13              13
-cmn                    current         current
-cmninit                8.0             40,3,-1
-compallsen             no              no
-debug                                  0
-dict                                   /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
-dictcase               no              no
-dither                 no              no
-doublebw               no              no
-ds                     1               1
-fdict                                  /usr/local/share/pocketsphinx/model/en-us/en-us/noisedict
-feat                   1s_c_d_dd       1s_c_d_dd
-featparams                             /usr/local/share/pocketsphinx/model/en-us/en-us/feat.params
-fillprob               1e-8            1.000000e-08
-frate                  100             100
-fsg
-fsgusealtpron          yes             yes
-fsgusefiller           yes             yes
-fwdflat                yes             yes
-fwdflatbeam            1e-64           1.000000e-64
-fwdflatefwid           4               4
-fwdflatlw              8.5             8.500000e+00
-fwdflatsfwin           25              25
-fwdflatwbeam           7e-29           7.000000e-29
-fwdtree                yes             yes
-hmm                                    /usr/local/share/pocketsphinx/model/en-us/en-us
-input_endian           little          little
-jsgf
-keyphrase                              forward
-kws
-kws_delay              10              10
-kws_plp                1e-1            1.000000e-01
-kws_threshold          1               1.000000e+20
-latsize                5000            5000
-lda
-ldadim                 0               0
-lifter                 0               22
-lm
-lmctl
-lmname
-logbase                1.0001          1.000100e+00
-logfn
-logspec                no              no
-lowerf                 133.33334       1.300000e+02
-lpbeam                 1e-40           1.000000e-40
-lponlybeam             7e-29           7.000000e-29
-lw                     6.5             6.500000e+00
-maxhmmpf               30000           30000
-maxwpf                 -1              -1
-mdef                                   /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
-mean                                   /usr/local/share/pocketsphinx/model/en-us/en-us/means
-mfclogdir
-min_endfr              0               0
-mixw
-mixwfloor              0.0000001       1.000000e-07
-mllr
-mmap                   yes             yes
-ncep                   13              13
-nfft                   512             512
-nfilt                  40              25
-nwpen                  1.0             1.000000e+00
-pbeam                  1e-48           1.000000e-48
-pip                    1.0             1.000000e+00
-pl_beam                1e-10           1.000000e-10
-pl_pbeam               1e-10           1.000000e-10
-pl_pip                 1.0             1.000000e+00
-pl_weight              3.0             3.000000e+00
-pl_window              5               5
-rawlogdir
-remove_dc              no              no
-remove_noise           yes             yes
-remove_silence         yes             yes
-round_filters          yes             yes
-samprate               16000           1.600000e+04
-seed                   -1              -1
-sendump                                /usr/local/share/pocketsphinx/model/en-us/en-us/sendump
-senlogdir
-senmgau
-silprob                0.005           5.000000e-03
-smoothspec             no              no
-svspec                                 0-12/13-25/26-38
-tmat                                   /usr/local/share/pocketsphinx/model/en-us/en-us/transition_matrices
-tmatfloor              0.0001          1.000000e-04
-topn                   4               4
-topn_beam              0               0
-toprule
-transform              legacy          dct
-unit_area              yes             yes
-upperf                 6855.4976       6.800000e+03
-uw                     1.0             1.000000e+00
-vad_postspeech         50              50
-vad_prespeech          20              20
-vad_startspeech        10              10
-vad_threshold          2.0             2.000000e+00
-var                                    /usr/local/share/pocketsphinx/model/en-us/en-us/variances
-varfloor               0.0001          1.000000e-04
-varnorm                no              no
-verbose                no              no
-warp_params
-warp_type              inverse_linear  inverse_linear
-wbeam                  7e-29           7.000000e-29
-wip                    0.65            6.500000e-01
-wlen                   0.025625        2.562500e-02

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(164): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/local/share/pocketsphinx/model/en-us/en-us/mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(206): Reading HMM transition probability matrices: /usr/local/share/pocketsphinx/model/en-us/en-us/transition_matrices
INFO: acmod.c(117): Attempting to use PTM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/means
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/local/share/pocketsphinx/model/en-us/en-us/variances
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(354): 222 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file /usr/local/share/pocketsphinx/model/en-us/en-us/sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(835): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 138623 * 20 bytes (2707 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict
INFO: dict.c(213): Allocated 1014 KiB for strings, 1677 KiB for phones
INFO: dict.c(336): 134522 words read
INFO: dict.c(358): Reading filler dictionary: /usr/local/share/pocketsphinx/model/en-us/en-us/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word triphones
INFO: kws_search.c(423): KWS(beam: -1080, plp: -23, default threshold 449, delay 10)
Traceback (most recent call last):
  File "./temp4.py", line 34, in <module>
    buf = stream.read(1024)
  File "/usr/lib/python2.7/dist-packages/pyaudio.py", line 605, in read
    return pa.read_stream(self._stream, num_frames)
IOError: [Errno Input overflowed] -9981
perf@warrenraspberrypi:~/Downloads/TrevorWarren/Python/Dev/PocketSpinx_TTS $ cat /proc/asound/
ALSA/        card1/       devices      modules      pcm          timers       version
card0/       cards        hwdep        oss/         seq/         U0x46d0x8d9/
perf@warrenraspberrypi:~/Downloads/TrevorWarren/Python/Dev/PocketSpinx_TTS $ cat /proc/asound/card0/
cat: /proc/asound/card0/: Is a directory
perf@warrenraspberrypi:~/Downloads/TrevorWarren/Python/Dev/PocketSpinx_TTS $ cat /proc/asound/card0/
id        pcm0c/    stream0   usbbus    usbid     usbmixer

VisualizeIT - 2016-05-09

Thanks for that G10DRAS. You were spot on. I spent a lot of time digging around and realized that I needed to play around with the buffer size.

stream = p.open(format=pyaudio.paInt16, channels=1, rate=8000, input=True, output=True, frames_per_buffer=8192)

I now can get the program to run but even for the keyword searches, nothing ever shows up. The python program doesn't ever manage to find any keywords!!!!

Last edit: VisualizeIT 2016-05-09

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-05-09
  
  You need to configure threshold as described in tutorial
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - VisualizeIT - 2016-05-09
    
    Thanks Nickolay. I'll take a look and come back with additional questions if required.
    
    I appreciate the support.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Streaming Pocketsphinx using Pyaudio

Speech Recognition Toolkit

Forums

Help

Streaming Pocketsphinx using Pyaudio document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Streaming Pocketsphinx using Pyaudio