Is it possible to use PocketSphinx for VAD without performing recognition? Ideally, I'd give it audio data and in return get pairs of timestamps indicating the start and end of utterances.
I'm using pre-recorded audio files, so VAD doesn't have to operate on-the-fly. I'd prefer accuracy over speed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you very much, Nickolay! I just tried WebRTC, and it works great!
For anybody in the same situation as, here's an extract showing how to use WebRTC's VAD:
VadInst*vadHandle=WebRtcVad_Create();if(!vadHandle)throwruntime_error("Error creating WebRTC VAD handle.");interror=WebRtcVad_Init(vadHandle);if(error)throwruntime_error("Error initializing WebRTC VAD handle.");constintaggressiveness=1;// 0..3. The higher, the more is cut off.error=WebRtcVad_set_mode(vadHandle,aggressiveness);if(error)throwruntime_error("Error setting WebRTC VAD aggressiveness.");// Call this in a loop, feeding the audio data.// The result value is 0 for inactive, 1 for active, and -1 for error (e.g., not enough data).boolisActive=WebRtcVad_Process(vadHandle,sampleRate,data,sampleCount)==1;WebRtcVad_Free(vadHandle);
Is it possible to use PocketSphinx for VAD without performing recognition? Ideally, I'd give it audio data and in return get pairs of timestamps indicating the start and end of utterances.
I'm using pre-recorded audio files, so VAD doesn't have to operate on-the-fly. I'd prefer accuracy over speed.
Yes, it is possible, the API is not very straightforward though, you can check sphinx_cont_seg sources in sphinxbase.
You can also check webrtc vad, it is more advanced than pocketsphinx one.
Thank you very much, Nickolay! I just tried WebRTC, and it works great!
For anybody in the same situation as, here's an extract showing how to use WebRTC's VAD:
Here's the official website.
And here's a list of the files you actually need:
Last edit: Daniel Wolf 2016-06-21