Menu

Bad to no Detection Rate with Pocketsphinx Android

Help
AlexM
2017-03-31
2017-03-31
  • AlexM

    AlexM - 2017-03-31

    Hello Sphinx Team,

    im using the pocketsphinx android demo and have problems with detection. The FAQ says that i probably haven't properly configured my decoder but im using the demo config except for a changed .gram file.

    I'm using Android Studio and tested the App on several devices ranging from Android 4.4 to 6.0 (i also listened to the RAW-Audio files, to check if the microphone is the problem).

    im Using the KeywordSearch and i either get a lot of false results or none if i lower the threshold so it seems that it it more or less random.

    Do i have to change any Config? Or is there a resource that explains me what the entries in the feat.params are?

    heres my digits.gram:

    zero /1e-5/
    one /1e-5/
    two /1e-5/
    three /1e-5/
    four /1e-5/
    five /1e-5/
    six /1e-5/
    seven /1e-5/
    eight /1e-5/
    nine /1e-5/
    

    My RecognizerSetup:

    public class RecognizerSetup {
    
        private Context context;
        private File assetFile;
        AssetHelper assetHelper = AssetHelper.getInstance();
    
        public SpeechRecognizer setup() {
            try {
                context = AndroidNativeUtil.getContext();
                Assets assets = new Assets(context);
                assetFile = assets.syncAssets();
                return setupRecognizer();
            } catch (IOException e) {
                return null;
            }
        }
    
        private SpeechRecognizer setupRecognizer() throws IOException {
            SpeechRecognizer recognizer = SpeechRecognizerSetup.defaultSetup()
                    .setAcousticModel(new File(assetFile, assetHelper.getAcousticModel())) "en-us-ptm"
                    .setDictionary(new File(assetFile, assetHelper.getDictionary())) "cmudict-en-us.dict"
                    .setRawLogDir(assetFile)
                    .getRecognizer();
            recognizer.addListener(new SpeechResultRecognizer(recognizer));
    
            String activeSpeech = assetHelper.getActiveSpeech(); //assetHelper returns "digits"
            if (activeSpeech.equals(SpeechConstants.PHONE_SEARCH)) {
                File phoneticModel = new File(assetFile, assetHelper.getLanguageModel());//assetHelper returns "en-phone.dmp"
                recognizer.addAllphoneSearch(assetHelper.getActiveSpeech(), phoneticModel);
    
            } else if (activeSpeech.equals(SpeechConstants.DIGITS_SEARCH)) {
                File digitsGrammar = new File(assetFile, "/grammar/digits.gram");
                recognizer.addKeywordSearch(assetHelper.getActiveSpeech(), digitsGrammar);
            }
    
            return recognizer;
        }
    }
    

    The class that implements RecognitionListener:

    public SpeechResultRecognizer(SpeechRecognizer recognizer) {
            this.recognizer = recognizer;
            outField = OutputField.getInstance();
            outField.setLine("Initialized");
    
        }
    
        @Override
        public void onBeginningOfSpeech() {
            outField.setLine("Beginning of Speech");
        }
    
        @Override
        public void onEndOfSpeech() {
            outField.setLine("End of Speech");
            recognizer.stop();
        }
    
        @Override
        public void onPartialResult(Hypothesis hypothesis) {
            if (hypothesis != null) {
                String text = hypothesis.getHypstr();
    
                outField.setLine(text +" score: "+ hypothesis.getProb());
            } else {
                //outField.setLine("Null Hypothesis PartRes");
            }
        }
    
        @Override
        public void onResult(Hypothesis hypothesis) {
            if (hypothesis != null) {
                String text = hypothesis.getHypstr();
                outField.setLine(text +" score: "+ hypothesis.getProb());
                action.put(text);
            } else {
                outField.setLine("Null Hypothesis Result");
            }
            recognizer.stop();
        }
    
        @Override
        public void onError(Exception e) {
            outField.setLine("Error");
            recognizer.stop();
        }
    
        @Override
        public void onTimeout() {
            outField.setLine("Timeout");
            recognizer.stop();
        }
    

    thats how i start the listening:

    @Override
        public void actionPerformed(final ActionEvent evt) {
            recognizer.stop();
            field.setLine("Speech Button pressed");
            recognizer.startListening(assetHelper.getActiveSpeech());
        }
    

    I've tried the same with the german voxforge model but that didn't work either.

     

    Last edit: AlexM 2017-03-31
    • Nickolay V. Shmyrev

      For reliable detection keyphrase should have 3-4 syllables, your digits are very short. Source:

      http://cmusphinx.sourceforge.net/wiki/tutoriallm#keyword_lists

       
      • AlexM

        AlexM - 2017-03-31

        Ok, so words like Binocular, Compiler, Highflying Illustrated etc are working Pretty well.
        Now my question is: if i want to say calculate something like 1 + 2 the individual words are too short but every possible combination wouldnt be an option either. Is it better to use say, an NGram search?

         
        • Nickolay V. Shmyrev

          Sorry, not sure what do you mean by "calculate". I think a practical approach would be to use an activation keyphrase like "ok google" which can be detected reliable + grammar search or language model search for recognition of the actual action. You can combine short words in activation keyphrase to get longer reliable phrase.

           
          • AlexM

            AlexM - 2017-03-31

            Oh i was trying to say i want my phone to calculate something and then say the Words "One plus two is..." but the combination of possible numbers i want to say is infinite so i was asking if its possible to use a Keyphrase + a grammar search with short words (one syllable) or if thats not possible.
            As i understand the Demo the use of the keyphrase is just to activate the speech which i'm doing with a button right now.

             
            • Nickolay V. Shmyrev

              If you want voice calculator, you implement a keyphrase for activation or a button for activation. Once activated you recognize with a language model and return result.

               
              • AlexM

                AlexM - 2017-03-31

                Thank you. thats what i was asking for.

                 
            • AlexM

              AlexM - 2017-03-31

              Addendum: Here's a Use Case:
              I have my smart watch with me and am carrying something so i have no hands free. I want to calculate how much something costs so i say to my watch: ok google whats 5.99 times 4

               
  • AlexM

    AlexM - 2017-03-31

    Thanks i'll try that

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.