I am having some issues with keyword detection on the raspberry pi, I recorded a wav file and saved it as mono 16 bit, I then ran the following code on it.
importsys,osfrompocketsphinx.pocketsphinximport*fromsphinxbase.sphinxbaseimport*importpyaudioimportwaveimporttime## constantsSAMPLE_RATE=16000CHUNK_SIZE=1024print("Setting directories")## Set directoriesmodeldir="/usr/local/lib/python3.4/dist-packages/pocketsphinx/model/"dictdir="/home/pi/Documents/models/"## Create a decoder configerationconfig=Decoder.default_config()config.set_string('-hmm',os.path.join(modeldir,'en-us'))config.set_string('-lm',os.path.join(modeldir,'en-us.lm.bin'))config.set_string('-dict',os.path.join(dictdir,'testdict.dict'))config.set_string('-keyphrase','talkin to')config.set_float('-kws_threshold',1e-5)## create decoderdecoder=Decoder(config)## open the wav fileprint("Opening Wav file")wf=wave.open('/home/pi/Downloads/TestRecordingProjectMono16.wav','rb');## extract data from wav filefilesize=wf.getnframes()data=wf.readframes(filesize)## start decoder utterancedecoder.start_utt()## tracking variablesindex=0detected_times=0## loop for all chunks until the end of the filewhileindex<filesize:## set number of bytes to CHUNK_SIZE unless that would overrun file sizeif(index+1024)>filesize:count=(filesize-index)else:count=CHUNK_SIZE## take sub set of the data temp_data=data[index:(index+count)]## process this subsetdecoder.process_raw(temp_data,False,False)#ifdecoder.hyp()!=None:##increment the keywords detected count detected_times=detected_times+1## print out some debugprint("Keyword detected @ ",(index/SAMPLE_RATE),"Seconds: String = ",decoder.hyp().hypstr,": Best score = ",decoder.hyp().best_score,": confidence = ",decoder.get_logmath().exp(decoder.hyp().prob))decoder.end_utt()decoder.start_utt()## update index to take the next subsetindex=index+count# print the number of keywords detectedprint("detected times = ",detected_times)decoder.end_utt()
the dictionary is a subset of the provided dictionary in the model directory and it looked like this
talkin T AO K IH NG
to T IH
I have tried different variations on the two words with varying results but in every case the output is keyords being detected at multiple times and/or in the wrong place, when the phrase is only present at approximately 8 seconds in to the wav file.
I have tried different threshold values from 1e-5 to 1e-50 with no real change to the results.
I have tried modifying my dictionary with different variants of talking and to originally I had all variants in the dict but with no difference to the output
and I have tried reversing the bit order incase my wav file was the wrong endianness.
Firstly, Does my code look sensible, the while loop is supposed to emulate getting chunks of data from a microphone.
secondly, if it does look ok, does anyone have a wav file and dictionary which is known working so I can rule that out / in.
thirdly, if none of the above are there any other options like noise reduction etc which are recommended.
Regards,
Graham
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi all,
I am having some issues with keyword detection on the raspberry pi, I recorded a wav file and saved it as mono 16 bit, I then ran the following code on it.
the dictionary is a subset of the provided dictionary in the model directory and it looked like this
I have tried different variations on the two words with varying results but in every case the output is keyords being detected at multiple times and/or in the wrong place, when the phrase is only present at approximately 8 seconds in to the wav file.
I have tried different threshold values from 1e-5 to 1e-50 with no real change to the results.
I have tried modifying my dictionary with different variants of talking and to originally I had all variants in the dict but with no difference to the output
and I have tried reversing the bit order incase my wav file was the wrong endianness.
Firstly, Does my code look sensible, the while loop is supposed to emulate getting chunks of data from a microphone.
secondly, if it does look ok, does anyone have a wav file and dictionary which is known working so I can rule that out / in.
thirdly, if none of the above are there any other options like noise reduction etc which are recommended.
Regards,
Graham
Your code is wrong. When you calculate
you consider only a half of a file. The correct code would be
since frame is 2 bytes.
There is also no need to read file at once, you can read data chunk by chunk.
Pocketsphinx has python examples, you can read them to learn how to use decoder properly.
https://github.com/cmusphinx/pocketsphinx/tree/master/swig/python/test