Menu

Keyword detection from wav file python.

Help
Graham
2017-03-08
2017-03-08
  • Graham

    Graham - 2017-03-08

    Hi all,

    I am having some issues with keyword detection on the raspberry pi, I recorded a wav file and saved it as mono 16 bit, I then ran the following code on it.

    import sys, os
    from pocketsphinx.pocketsphinx import *
    from sphinxbase.sphinxbase import *
    import pyaudio
    import wave
    import time
    #
    # constants
    SAMPLE_RATE = 16000
    CHUNK_SIZE = 1024
    print("Setting directories")
    #
    # Set directories
    modeldir = "/usr/local/lib/python3.4/dist-packages/pocketsphinx/model/" 
    dictdir = "/home/pi/Documents/models/"
    #
    # Create a decoder configeration
    config = Decoder.default_config()
    config.set_string('-hmm', os.path.join(modeldir, 'en-us'))
    config.set_string('-lm', os.path.join(modeldir, 'en-us.lm.bin'))
    config.set_string('-dict', os.path.join(dictdir, 'testdict.dict'))
    config.set_string('-keyphrase', 'talkin to')
    config.set_float('-kws_threshold', 1e-5)
    #
    # create decoder
    decoder = Decoder(config)
    #
    # open the wav file
    print("Opening Wav file")
    wf = wave.open('/home/pi/Downloads/TestRecordingProjectMono16.wav', 'rb');
    #
    # extract data from wav file
    filesize = wf.getnframes()
    data = wf.readframes(filesize)
    #
    # start decoder utterance
    decoder.start_utt()
    #
    # tracking variables
    index = 0
    detected_times = 0
    #
    # loop for all chunks until the end of the file
    while index < filesize:
        #
        # set number of bytes to CHUNK_SIZE unless that would overrun file size
        if (index + 1024) > filesize:
            count = (filesize - index)
        else:
            count = CHUNK_SIZE
        #
        # take sub set of the data    
        temp_data = data[index:(index+count)]
        #
        # process this subset
        decoder.process_raw(temp_data, False, False)
        #
        if decoder.hyp() != None:
            #
            #increment the keywords detected count 
            detected_times = detected_times + 1
            #
            # print out some debug
            print("Keyword detected @ ", (index / SAMPLE_RATE), "Seconds: String = ", decoder.hyp().hypstr, ": Best score = ", decoder.hyp().best_score, ": confidence = ", decoder.get_logmath().exp(decoder.hyp().prob))
            decoder.end_utt()
            decoder.start_utt()
        #
        # update index to take the next subset
        index = index + count
    # print the number of keywords detected
    print("detected times = ", detected_times)
    decoder.end_utt()
    

    the dictionary is a subset of the provided dictionary in the model directory and it looked like this

    talkin T AO K IH NG
    to T IH
    

    I have tried different variations on the two words with varying results but in every case the output is keyords being detected at multiple times and/or in the wrong place, when the phrase is only present at approximately 8 seconds in to the wav file.

    I have tried different threshold values from 1e-5 to 1e-50 with no real change to the results.
    I have tried modifying my dictionary with different variants of talking and to originally I had all variants in the dict but with no difference to the output
    and I have tried reversing the bit order incase my wav file was the wrong endianness.

    Firstly, Does my code look sensible, the while loop is supposed to emulate getting chunks of data from a microphone.
    secondly, if it does look ok, does anyone have a wav file and dictionary which is known working so I can rule that out / in.
    thirdly, if none of the above are there any other options like noise reduction etc which are recommended.

    Regards,
    Graham

     
    • Nickolay V. Shmyrev

      Firstly, Does my code look sensible, the while loop is supposed to emulate getting chunks of data from a microphone.

      Your code is wrong. When you calculate

      filesize = wf.getnframes()
      

      you consider only a half of a file. The correct code would be

      filesize = wf.getnframes() * 2
      

      since frame is 2 bytes.

      There is also no need to read file at once, you can read data chunk by chunk.

      Pocketsphinx has python examples, you can read them to learn how to use decoder properly.

      https://github.com/cmusphinx/pocketsphinx/tree/master/swig/python/test

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.