CMU Sphinx / Forums / Help: vad_startspeech equivalent for triggering end of utterance

Carlo Benussi - 2017-06-19

Hi,

I am using pocketsphinx, and while I am trying to tune the recognizer to achieve the best performances, I noticed the parameters:

-vad_postspeech 50 Num of silence frames to keep after from speech to silence. -vad_prespeech 20 Num of speech frames to keep before silence to speech. -vad_startspeech 10 Num of speech frames to trigger vad from silence to speech.

I found vad_startspeech a really useful parameter to tune, but I would like also (especially) to set the number of speech frames to trigger vad from speech to silence, and it seems this parameter is not present. Or am I missing something? Is it possible to set this parameter in some way?

Thanks in advance
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-06-19
  
  There is no such parameter unfortuantely, you can check the corresponding code in fe_noise.c:fe_vad_hangover.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Carlo Benussi - 2017-06-20

From the function fe_vad_hangover in fe_noise.c it seems that the parameter to tune is vad_postspeech. Indeed in the function there are the two checks for silence->speech and speech->silence transitions:

if (is_speech) { fe->vad_data->post_speech_frames = 0; if (!fe->vad_data->in_speech) { fe->vad_data->pre_speech_frames++; /* check for transition sil->speech */ if (fe->vad_data->pre_speech_frames >= fe->start_speech) { fe->vad_data->pre_speech_frames = 0; fe->vad_data->in_speech = 1; } } } else { fe->vad_data->pre_speech_frames = 0; if (fe->vad_data->in_speech) { fe->vad_data->post_speech_frames++; /* check for transition speech->sil */ if (fe->vad_data->post_speech_frames >= fe->post_speech) { fe->vad_data->post_speech_frames = 0; fe->vad_data->in_speech = 0; fe_prespch_reset_cep(fe->vad_data->prespch_buf); fe_prespch_reset_pcm(fe->vad_data->prespch_buf); } } }

So fe->post_speech is checked as a threshold for triggering the speech->silence transition.
Since from a quick check I believe this parameter is used only in this function, I kindly suggest to change the name and/or the description, for differentiating it from vad_prespeech and linking it to the similar vad_startspeech.

I suggest this change in the name + description:

Previous:

-vad_postspeech 50 Num of silence frames to keep after from speech to silence.

Correction:

-vad_stopspeech 50 Num of silence frames to trigger vad from speech to silence.

Thanks for the help. If I am suggesting something wrong, I apologize
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-06-20
  
  Well, yes, the name could be better. I do not think think we need to change the option name though to avoid backward compatibility issues.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vad_startspeech equivalent for triggering end of utterance

Speech Recognition Toolkit

Forums

Help

vad_startspeech equivalent for triggering end of utterance document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

vad_startspeech equivalent for triggering end of utterance