Menu

vad_startspeech equivalent for triggering end of utterance

Help
2017-06-19
2017-06-19
  • Carlo Benussi

    Carlo Benussi - 2017-06-19

    Hi,

    I am using pocketsphinx, and while I am trying to tune the recognizer to achieve the best performances, I noticed the parameters:

    -vad_postspeech     50      Num of silence frames to keep after from speech to silence.
    -vad_prespeech      20      Num of speech frames to keep before silence to speech.
    -vad_startspeech    10      Num of speech frames to trigger vad from silence to speech.
    

    I found vad_startspeech a really useful parameter to tune, but I would like also (especially) to set the number of speech frames to trigger vad from speech to silence, and it seems this parameter is not present. Or am I missing something? Is it possible to set this parameter in some way?

    Thanks in advance

     
    • Nickolay V. Shmyrev

      There is no such parameter unfortuantely, you can check the corresponding code in fe_noise.c:fe_vad_hangover.

       
  • Carlo Benussi

    Carlo Benussi - 2017-06-20

    From the function fe_vad_hangover in fe_noise.c it seems that the parameter to tune is vad_postspeech. Indeed in the function there are the two checks for silence->speech and speech->silence transitions:

        if (is_speech) {
            fe->vad_data->post_speech_frames = 0;
            if (!fe->vad_data->in_speech) {
                fe->vad_data->pre_speech_frames++;
                /* check for transition sil->speech */
                if (fe->vad_data->pre_speech_frames >= fe->start_speech) {
                    fe->vad_data->pre_speech_frames = 0;
                    fe->vad_data->in_speech = 1;
                }
            }
        } else {
            fe->vad_data->pre_speech_frames = 0;
            if (fe->vad_data->in_speech) {
                fe->vad_data->post_speech_frames++;
                /* check for transition speech->sil */
                if (fe->vad_data->post_speech_frames >= fe->post_speech) {
                    fe->vad_data->post_speech_frames = 0;
                    fe->vad_data->in_speech = 0;
                    fe_prespch_reset_cep(fe->vad_data->prespch_buf);
                fe_prespch_reset_pcm(fe->vad_data->prespch_buf);
                }
            }
        }
    

    So fe->post_speech is checked as a threshold for triggering the speech->silence transition.
    Since from a quick check I believe this parameter is used only in this function, I kindly suggest to change the name and/or the description, for differentiating it from vad_prespeech and linking it to the similar vad_startspeech.

    I suggest this change in the name + description:

    Previous:

    -vad_postspeech     50      Num of silence frames to keep after from speech to silence.
    

    Correction:

    -vad_stopspeech     50      Num of silence frames to trigger vad from speech to silence.
    

    Thanks for the help. If I am suggesting something wrong, I apologize

     
    • Nickolay V. Shmyrev

      Well, yes, the name could be better. I do not think think we need to change the option name though to avoid backward compatibility issues.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.