Also, I wrote it before, it is a bad idea to impose such things in grammar. You need to process audio continuously and glue utterances for postprocessing in parsing module if you want to intelligently process them. You should not implement this logic in VAD,
Consider the following simple JSGF. In PocketSphinx, what determines if it detects "hello" or "hello world" if I say "hello arbitrary pause world"?
public <speech> = hello [world];</speech>
Are there any parameters that control the outcome in this case? Is it possible to specify a silence threshold that means end of speech?
Or can I somehow add silence to the grammar? E.g.:
public <speech> = hello [world | sil]; // sil?</speech>
There is vad module which filters silence and detects the end of the utterance.
vad_threshold, vad_postspeech, vad_prespeech
silence is added automatically
Also, I wrote it before, it is a bad idea to impose such things in grammar. You need to process audio continuously and glue utterances for postprocessing in parsing module if you want to intelligently process them. You should not implement this logic in VAD,
You can check discussion here https://github.com/cmusphinx/pocketsphinx/issues/13#issuecomment-163304614