Menu

Setting up Pocketsphinx with Small Vocabulary

Help
2014-02-26
2014-03-01
  • Trevor Beckerman

    I'm very new to Pocketsphinx and have been having some trouble with recognition. I have three basic, low-level questions that I hope someone can answer. My first question is as follows:

    To get started, I've been using the "Turtle" dictionary and LM provided with the original package (along with the HMM in the source). When I run the following code (which is supposed to recognize "go forward ten meters"), it works fine:

    decoder=ps.Decoder(hmm='language/hmm', dict='language/turtle.dic', lm='language/turtle.DMP')
    fh=open('language/goforward.raw','rb')
    nsamp=decoder.decode_raw(fh)
    hyp,uttid,score=decoder.get_hyp()
    print "Got result %s %d" %(hyp, score)
    

    Note that "goforward.raw" is provided with the original package. Not sure what type of file this is, but I've been trying to reproduce the results by recording the same phrase and saving to a wav file. I made sure the recording was a 16kHz, 16-bit mono wav file, but the result was totally off. Not inaccurate, just completely wrong. Same thing happens when I run pocketsphinx_continuous -infile from the command line. My question is whether I'm encoding the audio file improperly or missing some other aspect of this setup. I'm going to upload the file I've been using ASAP and will post the link here once I do.

    My second question is this: Ultimately, I want to setup ps to respond to a very short list of commands, i.e., a very small dictionary. What's the quickest way to optimize ps for this type of setup? What are the drawbacks to generating a language model from my small corpus using CMU's online generator? Will ps listen to input that is in no way related to any of the specified commands but think that a command has been called, due to the very limited scope of the model? How would I go about training? Apologies if these questions reflect a gross misunderstanding of ps/voice recognition in general, but that's kind of where I am at the moment.

    Finally: I'm hoping to ultimately integrate this system into a Raspberry Pi that is constantly listening for commands. I've seen a number of posts on here about people doing this with gstreamer, but I'm wondering if there are any drawbacks to writing code that listens for utterances, saves to a wav file and runs ps on the wav file (i.e., avoids the use of gstreamer). I'm guessing recognition would be a little slower since gstreamer is designed to act as a pipeline, but would the accuracy be in any way compromised?

    Again, apologies for the elementary level questions. Just getting started with ps.

    Thanks in advance for the help and thanks for all the hard work you guys put into this software.

     
  • Nickolay V. Shmyrev

    Dear Trevor

    Not inaccurate, just completely wrong.

    There is definitely something wrong with the format. Share the file you have. You can attach it here or you can upload it to dropbox and give here a link.

    What's the quickest way to optimize ps for this type of setup?

    There is no much need to optimize, it should work out of box.

    What are the drawbacks to generating a language model from my small corpus using CMU's online generator?

    For a small vocabulary it's better to specify a grammar in JSGF format. Language models do not restrict the search space properly, for example they allow word repetitions. They are more for free-form language.

    However, it's a current trend to have free-form commands so probably you can consider that for your application too. Simple commands are not very convenient for everyday use.

    What's the quickest way to optimize ps for this type of setup? What are the drawbacks to generating a language model from my small corpus using CMU's online generator?

    We have recently implemented keyword activation for that, you can listen for keyword in a stream and respond only to it

    How would I go about training?

    I'm guessing recognition would be a little slower since gstreamer is designed to act as a pipeline, but would the accuracy be in any way compromised?

    Accuracy should be the same, you should worry about your filesystem instead. If you will place the files on sdcard it will not be alive for a long.

    Again, apologies for the elementary level questions. Just getting started with ps.

    Nevermind, feel free to ask

     
  • Trevor Beckerman

    So I ran sox on my wav file and it turns out the file was actually sampling at 48hkz and not 16, despite Audacity saying it was exporting at 16. Whoops, my b. Once I finally got it to 16 the Turtle example worked. This brings me to my next set of questions (thank you for your patience):

    Is keyword spotting mode able to use a dictionary to look for a list of phrases, or does it only look for the phrase passed in the -kws argument? If I'm looking for 3-4 different phrases, would it make more sense to run ps normally with a LM/HMM, etc or can I pass multiple strings as arguments in kws?

    Secondly, is there a trick to setting up the audio device with ps on a mac? Whenever I run kws on the command line I get an error message saying "Failed to open audio device", which is why I have to resort to running -infile on wav files. I've seen a few posts on this but haven't come across a solution. As Nickolay pointed out, storing wavs on an SD is a rough setup, and while I could store and then immediately remove them, I think it'd be a lot faster for ps to have access to the mic in realtime. I'll ultimately be on a Raspberry, but for the time being I'd like to get it running on my laptop mic.

    Lastly, I seem to only be able to run ps in kws on the command line. Whenever I run the following code I get an error saying "pocketsphinx.Decoder has no attribute 'default_config'":

    config=ps.Decoder.default_config()
    config.set_string('-kws', "hello computer")
    decoder = Decoder(config)
    

    I had installed the stable version of ps before checking out the dev version from the trunk to run kws - could that be the problem? If so, how do I sync the installations?

    Thanks Nickolay and everyone else for all your help.

     
  • Nickolay V. Shmyrev

    Is keyword spotting mode able to use a dictionary to look for a list of phrases, or does it only look for the phrase passed in the -kws argument?

    Only single phrase is supported for now.

    If I'm looking for 3-4 different phrases, would it make more sense to run ps normally with a LM/HMM, etc or can I pass multiple strings as arguments in kws?

    This feature is not supported. Usually you want to listen for a single activatio keyword and then specify a command to execute.

    Secondly, is there a trick to setting up the audio device with ps on a mac?

    If you are writing codein python you can use external audio input library like pyaudio and process result with process_raw method of the decoder. You can check this source for example

    https://github.com/mondhs/lt-pocketsphinx-tutorial/tree/master/impl/demo-py/public_service

    I had installed the stable version of ps before checking out the dev version from the trunk to run kws - could that be the problem? If so, how do I sync the installations?

    It seems you still have stable version installed. Remove installed python module from a stable version and install development version.

     
  • Trevor Beckerman

    Where are the python files in the development version? Deleted the modules from my packages folder and reinstalled ps and sphinxbase (checked out from the dev trunk), but now my python modules are gone...

     
  • Trevor Beckerman

    Never mind, think I got it. For some reason it put it in the site-packages folder for another installation I randomly have. Thanks again for the help.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.