CMU Sphinx / Forums / Help: Android : onPartialResult call many time without speaking!

Hello

After installing the android project, it runs fine, so i can say 'oh mighty computer' and the app recognize it. Cool

But, when i change it for simply 'hello', i have the onPartialResult function called many times by second. Each time, the Hypothesis containing one more 'hello', without saying anything.

So after 4 seconds, the Hypothesis contains "hello hello hello hello hello hello hello hello hello hello hello hello hello hello hello hello hello hello".

Could you tell me what is wrong.

Regards
Alexandre

this the code

    /* In partial result we get quick updates about current hypothesis. In
    * keyword spotting mode we can react here, in other modes we need to wait
    * for final result in onResult.
    */
    @Override
    public void onPartialResult(Hypothesis hypothesis)
    {
        if (hypothesis == null)
            return;

        String text = hypothesis.getHypstr();
        int prob = hypothesis.getProb ();


        Log.d ( "TEST", "onPartialResult: "+text );
        Log.d ( "TEST", "onPartialResult - prob: " + prob );

        if (text.equals(KEYPHRASE))
        {
            ((TextView) findViewById(R.id.result_text)).setText("partial:"+"bingo:"+text);
            recognizer.stop ();

        }

    }

    /**
     * This callback is called when we stop the recognizer.
     */
    @Override
    public void onResult(Hypothesis hypothesis)
    {


        ((TextView) findViewById(R.id.result_text)).setText("!!");
        if (hypothesis != null)
        {

            String text = hypothesis.getHypstr();
            ((TextView) findViewById(R.id.result_text)).setText(text+"!!");
            Log.d ( "TEST", "onResult:"+text );
            makeText(getApplicationContext(), text, Toast.LENGTH_SHORT).show();

            switchSearch ( KWS_SEARCH );
        }
    }

    @Override
    public void onBeginningOfSpeech()
    {
        Log.d ( "TEST", "onBeginningOfSpeech" );
    }

    /**
     * We stop recognizer here to get a final result
     */
    @Override
    public void onEndOfSpeech()
    {
        Log.d("TEST", "onEndOfSpeech");
        Log.d("TEST", "onEndOfSpeech:"+recognizer.getSearchName());



        if ( !recognizer.getSearchName().equals(KWS_SEARCH) )
            switchSearch(KWS_SEARCH);
    }

    private void switchSearch(String searchName)
    {

        recognizer.stop();

        // If we are not spotting, start listening with timeout (10000 ms or 10 seconds).
        if (searchName.equals(KWS_SEARCH))
            recognizer.startListening ( searchName );
        else
            recognizer.startListening(searchName, 10000);

        String caption = getResources().getString(captions.get(searchName));
        ((TextView) findViewById(R.id.caption_text)).setText(caption);
    }

    private void setupRecognizer(File assetsDir) throws IOException {
        // The recognizer can be configured to perform multiple searches
        // of different kind and switch between them

        recognizer = defaultSetup()
                .setAcousticModel(new File(assetsDir, "en-us-ptm"))
                .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))

                // To disable logging of raw audio comment out this call (takes a lot of space on the device)
                .setRawLogDir(assetsDir)

                // Threshold to tune for keyphrase to balance between false alarms and misses
                .setKeywordThreshold(1e-17f)/*45*/

                // Use context-independent phonetic search, context-dependent is too slow for mobile
                .setBoolean("-allphone_ci", true)/*true*/

                .getRecognizer();

Last edit: Nickolay V. Shmyrev 2015-04-09

Nickolay V. Shmyrev - 2015-04-09

You can set keyword threshold to recognize word more reliably. A value like 1e-1 might be reasonable.

Overall "hello" is too short, you will not get a reliable activation with it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre PELLET - 2015-04-09

Thanks Nickolay

I need to build an android app for french kids. They will speak single word, not phrase...

So do you think CMU Sphinx is not appropriate?

I've played with threshold value and it avoids plenty of onpartialResult, that's great.

But with some words like 'rabbit', i can't manage to find the "good" threshold value.
Once it don't detect it, once it detects it every time, without speaking...

One more question : in the app, an exercice will have 3 or 4 words.
I believe that i'll need to set a threshold value for each word.
So it is possible to set threshold value when i call "recognizer.startListening"

Thanks again
Alexandre

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-04-09
  
  But with some words like 'rabbit', i can't manage to find the "good" threshold value. Once it don't detect it, once it detects it every time, without speaking.
  
  Maybe your french 'r' is not good enough. Good acoustic model should fix this issue.
  
  So it is possible to set threshold value when i call "recognizer.startListening"
  
  You can configure recognizer to look for several keyphrases and specify threshold for each phrase separately. See
  
  http://stackoverflow.com/questions/25748113/recognizing-multiple-keywords-using-pocketsphinx
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre PELLET - 2015-04-10

Nickolay

If you have some time, i've got severals questions (again...)

1 - In order to find the good threshold for each word, it is possible to update the threshold value, without initialize the recognizer each time.
If yes, in which object/method could i achieve this.

2 - What do you mean by "Good acoustic model"? It is possible to make my own dictionary with a new voice, and only hundred of word?

3 - If 2 is yes, it is a big deal? Could you pointe me to a doc?

The goal of all of this is an application for french kids, to learn them some english words.

Thanks again
Regards
Alexandre

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-04-10
  
  In order to find the good threshold for each word, it is possible to update the threshold value, without initialize the recognizer each time.
  If yes, in which object/method could i achieve this.
  
  It is not possible now.
  
  2 - What do you mean by "Good acoustic model"?
  
  Good acoustic models recognizes sounds you need accurately.
  
  It is possible to make my own dictionary with a new voice, and only hundred of word?
  
  It is possible to to train acoustic models, but you need a lot of data for training. For isolated words you need about 100 examples of each word you want to train.
  
  for children you certainly need to train because our models are for adults.
  
  3 - If 2 is yes, it is a big deal? Could you pointe me to a doc?
  
  http://cmusphinx.sourceforge.net/wiki/tutorialam
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hi Nickolay,

thanks for your time.

Now, i try to give to the teachers, the ability to set themselves the threshold for each word in a lesson.

They have a plus and minus button which play with the threshold.
When they validate it, i setup again the SpeechRecognizer.

The problem is that when i setup again the SpeechRecognizer object, the recognizer is slower and slower and after 1 or 2 setup, it recognize nothing.

I believe that i'm not deleting the object in the right way.

This is a piece of code, and i hope you could help me on it.

//OnCreate
try
        {
            assets = new Assets(getActivity ());
            assetDir = assets.syncAssets();
        }
        catch (IOException e)
        {
            e.getMessage ();
        }

//OnCreateView
setupTaskRecognizer ( );

//setupTaskRecognizer
public void setupTaskRecognizer ( )
    {
        enableControls ( false );
        ((TextView) view.findViewById ( R.id.caption_text )).setText ( "Initialisation de la reconnaissance" );

        new AsyncTask<Void, Void, Exception> () {
            @Override
            protected Exception doInBackground(Void... params) {
                try
                {
                    setupRecognizer(assetDir, current_word );

                } catch (IOException e) {
                    return e;
                }
                return null;
            }

            @Override
            protected void onPostExecute(Exception result) {
                if (result != null)
                {
                    ((TextView) view.findViewById ( R.id.caption_text ))
                            .setText ( "Failed to init recognizer " + result );
                }
                else
                {
                    ((TextView) view.findViewById ( R.id.caption_text )).setText ( "Initialisation de la reconnaissance terminée" );

                    switchSearch(KEY_SEARCH);

                    enableControls ( true );
                }
            }
        }.execute();
    }

    private void setupRecognizer(File assetsDir, String word) throws IOException
    {
//Where i try to delete the object
        if ( recognizer != null )
        {
            recognizer.stop ();
            recognizer.cancel ();
            recognizer.removeListener ( this );
            recognizer.shutdown ();
            recognizer = null;
        }

        recognizer = defaultSetup()
                .setAcousticModel(new File(assetsDir, "en-us-ptm"))
                .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
                .setRawLogDir(assetsDir)
                .setKeywordThreshold(f_tolerance)/*1e-7f*/
                .setBoolean ( "-allphone_ci", true )/*true*/
                .getRecognizer();


        recognizer.addListener ( this );

        recognizer.addKeyphraseSearch ( KEY_SEARCH, word);
    }

    private void switchSearch(String searchName)
    {

        recognizer.stop();
        recognizer.startListening ( searchName, 30000 );
    }

Thanks a lot

Regards
Alexandre

Nickolay V. Shmyrev - 2015-04-20

It looks like an object leak, however, it is not easy to figure out why is it leaked, a logcat output might be helpful.

Actually you do not need to recreate the recognizer every time to change the threshold. You can change threshold with something like:

recognizer.getDecoder().getConfig().setFloat("-kws_threshold", 1e-10)

and then just readd the search to the decoder, it will have a new threshold.

Last edit: Nickolay V. Shmyrev 2015-04-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alexandre PELLET - 2015-04-21

Hi Nickolay

Thanks, it's work like a charm!

I was recreating the recognizer, because last week, you told me that it is not possible to update the threshold value, without initialize the recognizer each time.

Anyway, thanks again for your help.

Now, the teachers can set the threshold for each word they want the children to learn.

And, if it is not enougth, i believe that i will do an acoustic model with data from children voice.

Regards
Alexandre

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Android : onPartialResult call many time without speaking!

Speech Recognition Toolkit

Forums

Help

Android : onPartialResult call many time without speaking! document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Android : onPartialResult call many time without speaking!