Menu

Is it the right way to implement Pocketsphinx voice control over Android app?

Help
Zbigniew
2015-09-07
2015-09-08
  • Zbigniew

    Zbigniew - 2015-09-07

    Hello,

    I implemented simple keyword search for Android using Pocketsphinx, however my results are not satisfactory. Please advise me if I did it the right way and what I can improve.

    The goal is to recognize key words spoken by the user to the phone.

    My list of words looks like this (file "phrases"):

    forced error /1e-38/
    second /1e-10/
    double fault /1e-20/
    error /1e+3/
    winner /1e-12/
    player one /1e-28/
    etc.

    This is the way I setup my recognizer:

    private void setupRecognizer(File assetsDir) throws IOException {
        recognizer = defaultSetup()
                .setAcousticModel(new File(assetsDir, "en-us-ptm"))
                .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
                .setRawLogDir(assetsDir)
                .setBoolean("-allphone_ci", true)
                .getRecognizer();
    
        recognizer.addListener(this);
    
        File phrases = new File(assetsDir, "phrases");
        recognizer.addKeywordSearch(PHRASES_SEARCH, phrases);
    }
    

    Here I collect the result:

    @Override
    public void onPartialResult(Hypothesis hypothesis) {
        if (hypothesis != null)
        {
            String text = hypothesis.getHypstr();
            Log.d(TAG, "onPartialResult: " + text);
            makeText(getApplicationContext(), text, Toast.LENGTH_SHORT).show();
            highlightPhrase(text);
        }
    }
    

    And I restart recognizer on end of speech:

    @Override
    public void onEndOfSpeech() {
        recognizer.stop();
        recognizer.startListening(PHRASES_SEARCH);
    }
    

    I adjusted words' tresholds and it's better than before, but still I have some problems:

    1. Sometimes words are triggered in silence.
    2. On the other hand, sometimes I cannot trigger a word (it maybe my English which is far from native, though Google translator is not better).
    3. Sometimes hypothesis.getHypstr() returns string consisting of multiple keywords, while I would prefer to always return only single one.
    4. Vulnerable to environmental distractions.

    It feels like I have little control over what recognizer hears, adjusting tresholds in silent environment will cause the app to go crazy when used in bus for example. Is it a good idea to use Pocketsphinx to control Android app with voice? If so, how to implement it properly?

    Please share your experience, I will be very grateful.

     
    • Nickolay V. Shmyrev

      From what your describe it looks like thresholds are too high, you need to lower them.

       
  • Zbigniew

    Zbigniew - 2015-09-08

    Nickolay, thanks for your answer.

    However, if the thesholds are too low, too many words are popping in silence, and if the tresholds are high, I have troubles with triggering the words, even if I speak them different ways. Are there any .wav samples from en-us-ptm acoustic model to play them to my app for testing?

     

    Last edit: Zbigniew 2015-09-08
    • Nickolay V. Shmyrev

      Hi Zbigniew

      Well, there could be multiple issues. If there are too many words system could become slow to process data for example and it will destroy the accuracy.

      I suggest you to enable raw data collection (-rawlogdir option) and share raw files so I can look what is going on there.

      If you want to test pocketsphinx keyword spotting, you can download an English book chapter from librivox and spot the a few keywords there with pocketsphinx_continuous.

       
      • Nickolay V. Shmyrev

        And, obviously, our model is US English, it might have trouble with accented speech. Maybe you can try to speak closer to US English.

         

Log in to post a comment.