CMU Sphinx / Forums / Help: Keyword detection from wav file python.

Hi all,

I am having some issues with keyword detection on the raspberry pi, I recorded a wav file and saved it as mono 16 bit, I then ran the following code on it.

import sys, os
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *
import pyaudio
import wave
import time
#
# constants
SAMPLE_RATE = 16000
CHUNK_SIZE = 1024
print("Setting directories")
#
# Set directories
modeldir = "/usr/local/lib/python3.4/dist-packages/pocketsphinx/model/" 
dictdir = "/home/pi/Documents/models/"
#
# Create a decoder configeration
config = Decoder.default_config()
config.set_string('-hmm', os.path.join(modeldir, 'en-us'))
config.set_string('-lm', os.path.join(modeldir, 'en-us.lm.bin'))
config.set_string('-dict', os.path.join(dictdir, 'testdict.dict'))
config.set_string('-keyphrase', 'talkin to')
config.set_float('-kws_threshold', 1e-5)
#
# create decoder
decoder = Decoder(config)
#
# open the wav file
print("Opening Wav file")
wf = wave.open('/home/pi/Downloads/TestRecordingProjectMono16.wav', 'rb');
#
# extract data from wav file
filesize = wf.getnframes()
data = wf.readframes(filesize)
#
# start decoder utterance
decoder.start_utt()
#
# tracking variables
index = 0
detected_times = 0
#
# loop for all chunks until the end of the file
while index < filesize:
    #
    # set number of bytes to CHUNK_SIZE unless that would overrun file size
    if (index + 1024) > filesize:
        count = (filesize - index)
    else:
        count = CHUNK_SIZE
    #
    # take sub set of the data    
    temp_data = data[index:(index+count)]
    #
    # process this subset
    decoder.process_raw(temp_data, False, False)
    #
    if decoder.hyp() != None:
        #
        #increment the keywords detected count 
        detected_times = detected_times + 1
        #
        # print out some debug
        print("Keyword detected @ ", (index / SAMPLE_RATE), "Seconds: String = ", decoder.hyp().hypstr, ": Best score = ", decoder.hyp().best_score, ": confidence = ", decoder.get_logmath().exp(decoder.hyp().prob))
        decoder.end_utt()
        decoder.start_utt()
    #
    # update index to take the next subset
    index = index + count
# print the number of keywords detected
print("detected times = ", detected_times)
decoder.end_utt()

the dictionary is a subset of the provided dictionary in the model directory and it looked like this

talkin T AO K IH NG
to T IH

I have tried different variations on the two words with varying results but in every case the output is keyords being detected at multiple times and/or in the wrong place, when the phrase is only present at approximately 8 seconds in to the wav file.

I have tried different threshold values from 1e-5 to 1e-50 with no real change to the results.
I have tried modifying my dictionary with different variants of talking and to originally I had all variants in the dict but with no difference to the output
and I have tried reversing the bit order incase my wav file was the wrong endianness.

Firstly, Does my code look sensible, the while loop is supposed to emulate getting chunks of data from a microphone.
secondly, if it does look ok, does anyone have a wav file and dictionary which is known working so I can rule that out / in.
thirdly, if none of the above are there any other options like noise reduction etc which are recommended.

Regards,
Graham

Keyword detection from wav file python.

Speech Recognition Toolkit

Forums

Help

Keyword detection from wav file python. document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Keyword detection from wav file python.