CMU Sphinx / Forums / Help: Bad to no Detection Rate with Pocketsphinx Android

AlexM - 2017-03-31

Hello Sphinx Team,

im using the pocketsphinx android demo and have problems with detection. The FAQ says that i probably haven't properly configured my decoder but im using the demo config except for a changed .gram file.

I'm using Android Studio and tested the App on several devices ranging from Android 4.4 to 6.0 (i also listened to the RAW-Audio files, to check if the microphone is the problem).

im Using the KeywordSearch and i either get a lot of false results or none if i lower the threshold so it seems that it it more or less random.

Do i have to change any Config? Or is there a resource that explains me what the entries in the feat.params are?

heres my digits.gram:

zero /1e-5/ one /1e-5/ two /1e-5/ three /1e-5/ four /1e-5/ five /1e-5/ six /1e-5/ seven /1e-5/ eight /1e-5/ nine /1e-5/

My RecognizerSetup:

public class RecognizerSetup { private Context context; private File assetFile; AssetHelper assetHelper = AssetHelper.getInstance(); public SpeechRecognizer setup() { try { context = AndroidNativeUtil.getContext(); Assets assets = new Assets(context); assetFile = assets.syncAssets(); return setupRecognizer(); } catch (IOException e) { return null; } } private SpeechRecognizer setupRecognizer() throws IOException { SpeechRecognizer recognizer = SpeechRecognizerSetup.defaultSetup() .setAcousticModel(new File(assetFile, assetHelper.getAcousticModel())) "en-us-ptm" .setDictionary(new File(assetFile, assetHelper.getDictionary())) "cmudict-en-us.dict" .setRawLogDir(assetFile) .getRecognizer(); recognizer.addListener(new SpeechResultRecognizer(recognizer)); String activeSpeech = assetHelper.getActiveSpeech(); //assetHelper returns "digits" if (activeSpeech.equals(SpeechConstants.PHONE_SEARCH)) { File phoneticModel = new File(assetFile, assetHelper.getLanguageModel());//assetHelper returns "en-phone.dmp" recognizer.addAllphoneSearch(assetHelper.getActiveSpeech(), phoneticModel); } else if (activeSpeech.equals(SpeechConstants.DIGITS_SEARCH)) { File digitsGrammar = new File(assetFile, "/grammar/digits.gram"); recognizer.addKeywordSearch(assetHelper.getActiveSpeech(), digitsGrammar); } return recognizer; } }

The class that implements RecognitionListener:

public SpeechResultRecognizer(SpeechRecognizer recognizer) { this.recognizer = recognizer; outField = OutputField.getInstance(); outField.setLine("Initialized"); } @Override public void onBeginningOfSpeech() { outField.setLine("Beginning of Speech"); } @Override public void onEndOfSpeech() { outField.setLine("End of Speech"); recognizer.stop(); } @Override public void onPartialResult(Hypothesis hypothesis) { if (hypothesis != null) { String text = hypothesis.getHypstr(); outField.setLine(text +" score: "+ hypothesis.getProb()); } else { //outField.setLine("Null Hypothesis PartRes"); } } @Override public void onResult(Hypothesis hypothesis) { if (hypothesis != null) { String text = hypothesis.getHypstr(); outField.setLine(text +" score: "+ hypothesis.getProb()); action.put(text); } else { outField.setLine("Null Hypothesis Result"); } recognizer.stop(); } @Override public void onError(Exception e) { outField.setLine("Error"); recognizer.stop(); } @Override public void onTimeout() { outField.setLine("Timeout"); recognizer.stop(); }

thats how i start the listening:

@Override public void actionPerformed(final ActionEvent evt) { recognizer.stop(); field.setLine("Speech Button pressed"); recognizer.startListening(assetHelper.getActiveSpeech()); }

I've tried the same with the german voxforge model but that didn't work either.

Last edit: AlexM 2017-03-31
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-03-31
  
  For reliable detection keyphrase should have 3-4 syllables, your digits are very short. Source:
  
  http://cmusphinx.sourceforge.net/wiki/tutoriallm#keyword_lists
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - AlexM - 2017-03-31
    
    Ok, so words like Binocular, Compiler, Highflying Illustrated etc are working Pretty well.
    Now my question is: if i want to say calculate something like 1 + 2 the individual words are too short but every possible combination wouldnt be an option either. Is it better to use say, an NGram search?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2017-03-31
      
      Sorry, not sure what do you mean by "calculate". I think a practical approach would be to use an activation keyphrase like "ok google" which can be detected reliable + grammar search or language model search for recognition of the actual action. You can combine short words in activation keyphrase to get longer reliable phrase.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - AlexM - 2017-03-31
        
        Oh i was trying to say i want my phone to calculate something and then say the Words "One plus two is..." but the combination of possible numbers i want to say is infinite so i was asking if its possible to use a Keyphrase + a grammar search with short words (one syllable) or if thats not possible.
        As i understand the Demo the use of the keyphrase is just to activate the speech which i'm doing with a button right now.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Nickolay V. Shmyrev - 2017-03-31
        
        If you want voice calculator, you implement a keyphrase for activation or a button for activation. Once activated you recognize with a language model and return result.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        AlexM - 2017-03-31
        
        Thank you. thats what i was asking for.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        AlexM - 2017-03-31
        
        Addendum: Here's a Use Case:
        I have my smart watch with me and am carrying something so i have no hands free. I want to calculate how much something costs so i say to my watch: ok google whats 5.99 times 4
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

AlexM - 2017-03-31

Thanks i'll try that

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Bad to no Detection Rate with Pocketsphinx Android

Speech Recognition Toolkit

Forums

Help

Bad to no Detection Rate with Pocketsphinx Android document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Bad to no Detection Rate with Pocketsphinx Android