I am using pocketsphinx, and while I am trying to tune the recognizer to achieve the best performances, I noticed the parameters:
-vad_postspeech 50 Num of silence frames to keep after from speech to silence.
-vad_prespeech 20 Num of speech frames to keep before silence to speech.
-vad_startspeech 10 Num of speech frames to trigger vad from silence to speech.
I found vad_startspeech a really useful parameter to tune, but I would like also (especially) to set the number of speech frames to trigger vad from speech to silence, and it seems this parameter is not present. Or am I missing something? Is it possible to set this parameter in some way?
Thanks in advance
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
From the function fe_vad_hangover in fe_noise.c it seems that the parameter to tune is vad_postspeech. Indeed in the function there are the two checks for silence->speech and speech->silence transitions:
if (is_speech) {
fe->vad_data->post_speech_frames = 0;
if (!fe->vad_data->in_speech) {
fe->vad_data->pre_speech_frames++;
/* check for transition sil->speech */
if (fe->vad_data->pre_speech_frames >= fe->start_speech) {
fe->vad_data->pre_speech_frames = 0;
fe->vad_data->in_speech = 1;
}
}
} else {
fe->vad_data->pre_speech_frames = 0;
if (fe->vad_data->in_speech) {
fe->vad_data->post_speech_frames++;
/* check for transition speech->sil */
if (fe->vad_data->post_speech_frames >= fe->post_speech) {
fe->vad_data->post_speech_frames = 0;
fe->vad_data->in_speech = 0;
fe_prespch_reset_cep(fe->vad_data->prespch_buf);
fe_prespch_reset_pcm(fe->vad_data->prespch_buf);
}
}
}
So fe->post_speech is checked as a threshold for triggering the speech->silence transition.
Since from a quick check I believe this parameter is used only in this function, I kindly suggest to change the name and/or the description, for differentiating it from vad_prespeech and linking it to the similar vad_startspeech.
I suggest this change in the name + description:
Previous:
-vad_postspeech 50 Num of silence frames to keep after from speech to silence.
Correction:
-vad_stopspeech 50 Num of silence frames to trigger vad from speech to silence.
Thanks for the help. If I am suggesting something wrong, I apologize
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I am using pocketsphinx, and while I am trying to tune the recognizer to achieve the best performances, I noticed the parameters:
I found vad_startspeech a really useful parameter to tune, but I would like also (especially) to set the number of speech frames to trigger vad from speech to silence, and it seems this parameter is not present. Or am I missing something? Is it possible to set this parameter in some way?
Thanks in advance
There is no such parameter unfortuantely, you can check the corresponding code in fe_noise.c:fe_vad_hangover.
From the function
fe_vad_hangover
infe_noise.c
it seems that the parameter to tune isvad_postspeech
. Indeed in the function there are the two checks for silence->speech and speech->silence transitions:So
fe->post_speech
is checked as a threshold for triggering the speech->silence transition.Since from a quick check I believe this parameter is used only in this function, I kindly suggest to change the name and/or the description, for differentiating it from
vad_prespeech
and linking it to the similarvad_startspeech
.I suggest this change in the name + description:
Previous:
Correction:
Thanks for the help. If I am suggesting something wrong, I apologize
Well, yes, the name could be better. I do not think think we need to change the option name though to avoid backward compatibility issues.