CMU Sphinx / Forums / Help: Proper way to tweak acoustic model for keyword-based state machine

z - 2014-12-07

I'm trying to build an Android app centered around a state machine where the user speaks keywords to transition between states. Is it possible to prompt the user to record these keywords in advance, and then compare the input from the microphone to only the pre-recorded keywords instead of relying on a huge dictionary?

To test, I started with the standard pocketsphinx Android app and swapped out the weather language model with the generic US English one. Once I did that, I've found that I get a lot of words translated incorrectly.

The docs on adapting the acoustic model mention that I can train my app with custom audio, but is this an appropriate way to use pocketsphinx? Any guidance would be appreciated.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-12-07
  
  Is it possible to prompt the user to record these keywords in advance, and then compare the input from the microphone to only the pre-recorded keywords instead of relying on a huge dictionary?
  
  Yes, it is possible, you can implement corresponding DTW algorithm. However, it might be hard to tune it to work reliably.
  
  To test, I started with the standard pocketsphinx Android app and swapped out the weather language model with the generic US English one. Once I did that, I've found that I get a lot of words translated incorrectly.
  
  You need to use en-us-semi model which comes with the demo and specifically was created for mobile. en-us-generic continuous model is too slow and can not be used on mobile.
  
  The docs on adapting the acoustic model mention that I can train my app with custom audio, but is this an appropriate way to use pocketsphinx? Any guidance would be appreciated.
  
  First of all you could figure out what is the accuracy of detection, what is the amount of false alarms and so on. Overall it should work out-of-box.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

z - 2014-12-07

Thanks Nickolay. It appears that the en-us-semi model is an acoustic model. Is there a corresponding en-us-semi language model that goes with it? I tweaked the sample app to use these parameters:

hmm/en-us-semi acoustic model (same as demo app)

dict/cmu07a.dic dictionary (same as demo app)

cmusphinx-5.0-en-us.lm.dmp language model from here
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-12-07
  
  Hello Zach
  
  Keyword spotting doesn't require a language model, you can just specify a list of keywords you are looking for, like "oh mighty computer".
  
  Large vocabulary transcription with a language model like cmusphinx-5.0-en-us.lm.dmp on a phone is not really possible, phone is too slow for that.
  
  Last edit: Nickolay V. Shmyrev 2014-12-07
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

z - 2014-12-08

Got it, this is making sense. Is it possible for the keyword search to look for multiple keywords?

For example, I want to listen for "start" or "stop". I tried adding two keyword listeners, but it seems like I can only call startListening on one of them at a time. And if I use the keyword "start stop", onResult and onPartialResult aren't called unless I say "start stop".

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- bic-user - 2014-12-08
  
  Is it possible for the keyword search to look for multiple keywords?
  
  Yes. You should provide path to file with your keyphrases listed one per line via "-kws". "Start" and "stop" aren't very robust for detection. Keyphrases with >= 3 syllables should be used if possible.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

z - 2014-12-08

Thanks for the tips everyone. In case anyone else is following this thread with a similar issue, I ended up using addKeywordSearch instead of addKeyphraseSearch and listed out the words in a grammar file.

Grammar file format:
apple /1e-1/
banana /1e-1/
carrot /1e-1/

Each keyword is followed by a threshold enclosed in slashes. Here is another useful article.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- alfahim - 2015-03-17
  
  Thank you, this basic info should be mentioned in the documentation.
  It took me hours to find it through cross sites jumping.
  
  Last edit: alfahim 2015-03-17
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Proper way to tweak acoustic model for keyword-based state machine

Speech Recognition Toolkit

Forums

Help

Proper way to tweak acoustic model for keyword-based state machine document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Proper way to tweak acoustic model for keyword-based state machine