Menu

Android : onPartialResult call many time without speaking!

Help
2015-04-09
2015-04-21
  • Alexandre PELLET

    Hello

    After installing the android project, it runs fine, so i can say 'oh mighty computer' and the app recognize it. Cool

    But, when i change it for simply 'hello', i have the onPartialResult function called many times by second. Each time, the Hypothesis containing one more 'hello', without saying anything.

    So after 4 seconds, the Hypothesis contains "hello hello hello hello hello hello hello hello hello hello hello hello hello hello hello hello hello hello".

    Could you tell me what is wrong.

    Regards
    Alexandre

    this the code

        /* In partial result we get quick updates about current hypothesis. In
        * keyword spotting mode we can react here, in other modes we need to wait
        * for final result in onResult.
        */
        @Override
        public void onPartialResult(Hypothesis hypothesis)
        {
            if (hypothesis == null)
                return;
    
            String text = hypothesis.getHypstr();
            int prob = hypothesis.getProb ();
    
    
            Log.d ( "TEST", "onPartialResult: "+text );
            Log.d ( "TEST", "onPartialResult - prob: " + prob );
    
            if (text.equals(KEYPHRASE))
            {
                ((TextView) findViewById(R.id.result_text)).setText("partial:"+"bingo:"+text);
                recognizer.stop ();
    
            }
    
        }
    
        /**
         * This callback is called when we stop the recognizer.
         */
        @Override
        public void onResult(Hypothesis hypothesis)
        {
    
    
            ((TextView) findViewById(R.id.result_text)).setText("!!");
            if (hypothesis != null)
            {
    
                String text = hypothesis.getHypstr();
                ((TextView) findViewById(R.id.result_text)).setText(text+"!!");
                Log.d ( "TEST", "onResult:"+text );
                makeText(getApplicationContext(), text, Toast.LENGTH_SHORT).show();
    
                switchSearch ( KWS_SEARCH );
            }
        }
    
        @Override
        public void onBeginningOfSpeech()
        {
            Log.d ( "TEST", "onBeginningOfSpeech" );
        }
    
        /**
         * We stop recognizer here to get a final result
         */
        @Override
        public void onEndOfSpeech()
        {
            Log.d("TEST", "onEndOfSpeech");
            Log.d("TEST", "onEndOfSpeech:"+recognizer.getSearchName());
    
    
    
            if ( !recognizer.getSearchName().equals(KWS_SEARCH) )
                switchSearch(KWS_SEARCH);
        }
    
        private void switchSearch(String searchName)
        {
    
            recognizer.stop();
    
            // If we are not spotting, start listening with timeout (10000 ms or 10 seconds).
            if (searchName.equals(KWS_SEARCH))
                recognizer.startListening ( searchName );
            else
                recognizer.startListening(searchName, 10000);
    
            String caption = getResources().getString(captions.get(searchName));
            ((TextView) findViewById(R.id.caption_text)).setText(caption);
        }
    
        private void setupRecognizer(File assetsDir) throws IOException {
            // The recognizer can be configured to perform multiple searches
            // of different kind and switch between them
    
            recognizer = defaultSetup()
                    .setAcousticModel(new File(assetsDir, "en-us-ptm"))
                    .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
    
                    // To disable logging of raw audio comment out this call (takes a lot of space on the device)
                    .setRawLogDir(assetsDir)
    
                    // Threshold to tune for keyphrase to balance between false alarms and misses
                    .setKeywordThreshold(1e-17f)/*45*/
    
                    // Use context-independent phonetic search, context-dependent is too slow for mobile
                    .setBoolean("-allphone_ci", true)/*true*/
    
                    .getRecognizer();
    
     

    Last edit: Nickolay V. Shmyrev 2015-04-09
    • Nickolay V. Shmyrev

      You can set keyword threshold to recognize word more reliably. A value like 1e-1 might be reasonable.

      Overall "hello" is too short, you will not get a reliable activation with it.

       
  • Alexandre PELLET

    Thanks Nickolay

    I need to build an android app for french kids. They will speak single word, not phrase...

    So do you think CMU Sphinx is not appropriate?

    I've played with threshold value and it avoids plenty of onpartialResult, that's great.

    But with some words like 'rabbit', i can't manage to find the "good" threshold value.
    Once it don't detect it, once it detects it every time, without speaking...

    One more question : in the app, an exercice will have 3 or 4 words.
    I believe that i'll need to set a threshold value for each word.
    So it is possible to set threshold value when i call "recognizer.startListening"

    Thanks again
    Alexandre

     
    • Nickolay V. Shmyrev

      But with some words like 'rabbit', i can't manage to find the "good" threshold value. Once it don't detect it, once it detects it every time, without speaking.

      Maybe your french 'r' is not good enough. Good acoustic model should fix this issue.

      So it is possible to set threshold value when i call "recognizer.startListening"

      You can configure recognizer to look for several keyphrases and specify threshold for each phrase separately. See

      http://stackoverflow.com/questions/25748113/recognizing-multiple-keywords-using-pocketsphinx

       
  • Alexandre PELLET

    Nickolay

    If you have some time, i've got severals questions (again...)

    1 - In order to find the good threshold for each word, it is possible to update the threshold value, without initialize the recognizer each time.
    If yes, in which object/method could i achieve this.

    2 - What do you mean by "Good acoustic model"? It is possible to make my own dictionary with a new voice, and only hundred of word?

    3 - If 2 is yes, it is a big deal? Could you pointe me to a doc?

    The goal of all of this is an application for french kids, to learn them some english words.

    Thanks again
    Regards
    Alexandre

     
    • Nickolay V. Shmyrev

      In order to find the good threshold for each word, it is possible to update the threshold value, without initialize the recognizer each time.
      If yes, in which object/method could i achieve this.

      It is not possible now.

      2 - What do you mean by "Good acoustic model"?

      Good acoustic models recognizes sounds you need accurately.

      It is possible to make my own dictionary with a new voice, and only hundred of word?

      It is possible to to train acoustic models, but you need a lot of data for training. For isolated words you need about 100 examples of each word you want to train.

      for children you certainly need to train because our models are for adults.

      3 - If 2 is yes, it is a big deal? Could you pointe me to a doc?

      http://cmusphinx.sourceforge.net/wiki/tutorialam

       
  • Alexandre PELLET

    Hi Nickolay,

    thanks for your time.

    Now, i try to give to the teachers, the ability to set themselves the threshold for each word in a lesson.

    They have a plus and minus button which play with the threshold.
    When they validate it, i setup again the SpeechRecognizer.

    The problem is that when i setup again the SpeechRecognizer object, the recognizer is slower and slower and after 1 or 2 setup, it recognize nothing.

    I believe that i'm not deleting the object in the right way.

    This is a piece of code, and i hope you could help me on it.

    //OnCreate
    try
            {
                assets = new Assets(getActivity ());
                assetDir = assets.syncAssets();
            }
            catch (IOException e)
            {
                e.getMessage ();
            }
    
    //OnCreateView
    setupTaskRecognizer ( );
    
    //setupTaskRecognizer
    public void setupTaskRecognizer ( )
        {
            enableControls ( false );
            ((TextView) view.findViewById ( R.id.caption_text )).setText ( "Initialisation de la reconnaissance" );
    
            new AsyncTask<Void, Void, Exception> () {
                @Override
                protected Exception doInBackground(Void... params) {
                    try
                    {
                        setupRecognizer(assetDir, current_word );
    
                    } catch (IOException e) {
                        return e;
                    }
                    return null;
                }
    
                @Override
                protected void onPostExecute(Exception result) {
                    if (result != null)
                    {
                        ((TextView) view.findViewById ( R.id.caption_text ))
                                .setText ( "Failed to init recognizer " + result );
                    }
                    else
                    {
                        ((TextView) view.findViewById ( R.id.caption_text )).setText ( "Initialisation de la reconnaissance terminée" );
    
                        switchSearch(KEY_SEARCH);
    
                        enableControls ( true );
                    }
                }
            }.execute();
        }
    
        private void setupRecognizer(File assetsDir, String word) throws IOException
        {
    //Where i try to delete the object
            if ( recognizer != null )
            {
                recognizer.stop ();
                recognizer.cancel ();
                recognizer.removeListener ( this );
                recognizer.shutdown ();
                recognizer = null;
            }
    
            recognizer = defaultSetup()
                    .setAcousticModel(new File(assetsDir, "en-us-ptm"))
                    .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
                    .setRawLogDir(assetsDir)
                    .setKeywordThreshold(f_tolerance)/*1e-7f*/
                    .setBoolean ( "-allphone_ci", true )/*true*/
                    .getRecognizer();
    
    
            recognizer.addListener ( this );
    
            recognizer.addKeyphraseSearch ( KEY_SEARCH, word);
        }
    
        private void switchSearch(String searchName)
        {
    
            recognizer.stop();
            recognizer.startListening ( searchName, 30000 );
        }
    

    Thanks a lot

    Regards
    Alexandre

     
    • Nickolay V. Shmyrev

      It looks like an object leak, however, it is not easy to figure out why is it leaked, a logcat output might be helpful.

      Actually you do not need to recreate the recognizer every time to change the threshold. You can change threshold with something like:

       recognizer.getDecoder().getConfig().setFloat("-kws_threshold", 1e-10)
      

      and then just readd the search to the decoder, it will have a new threshold.

       

      Last edit: Nickolay V. Shmyrev 2015-04-20
  • Alexandre PELLET

    Hi Nickolay

    Thanks, it's work like a charm!

    I was recreating the recognizer, because last week, you told me that it is not possible to update the threshold value, without initialize the recognizer each time.

    Anyway, thanks again for your help.

    Now, the teachers can set the threshold for each word they want the children to learn.

    And, if it is not enougth, i believe that i will do an acoustic model with data from children voice.

    Regards
    Alexandre

     

Log in to post a comment.