Menu

Using PocketSphinx for Voice Activity Detection (VAD)

Help
2016-06-19
2016-06-21
  • Daniel Wolf

    Daniel Wolf - 2016-06-19

    Is it possible to use PocketSphinx for VAD without performing recognition? Ideally, I'd give it audio data and in return get pairs of timestamps indicating the start and end of utterances.

    I'm using pre-recorded audio files, so VAD doesn't have to operate on-the-fly. I'd prefer accuracy over speed.

     
    • Nickolay V. Shmyrev

      Yes, it is possible, the API is not very straightforward though, you can check sphinx_cont_seg sources in sphinxbase.

      You can also check webrtc vad, it is more advanced than pocketsphinx one.

       
      • Daniel Wolf

        Daniel Wolf - 2016-06-21

        Thank you very much, Nickolay! I just tried WebRTC, and it works great!

        For anybody in the same situation as, here's an extract showing how to use WebRTC's VAD:

        VadInst* vadHandle = WebRtcVad_Create();
        if (!vadHandle) throw runtime_error("Error creating WebRTC VAD handle.");
        
        int error = WebRtcVad_Init(vadHandle);
        if (error) throw runtime_error("Error initializing WebRTC VAD handle.");
        
        const int aggressiveness = 1; // 0..3. The higher, the more is cut off.
        error = WebRtcVad_set_mode(vadHandle, aggressiveness);
        if (error) throw runtime_error("Error setting WebRTC VAD aggressiveness.");
        
        // Call this in a loop, feeding the audio data.
        // The result value is 0 for inactive, 1 for active, and -1 for error (e.g., not enough data).
        bool isActive = WebRtcVad_Process(vadHandle, sampleRate, data, sampleCount) == 1;
        
        WebRtcVad_Free(vadHandle);
        

        Here's the official website.

        And here's a list of the files you actually need:

        lib/webrtc-8d2248ff/webrtc/common_audio/signal_processing/cross_correlation.c
        lib/webrtc-8d2248ff/webrtc/common_audio/signal_processing/division_operations.c
        lib/webrtc-8d2248ff/webrtc/common_audio/signal_processing/downsample_fast.c
        lib/webrtc-8d2248ff/webrtc/common_audio/signal_processing/energy.c
        lib/webrtc-8d2248ff/webrtc/common_audio/signal_processing/get_scaling_square.c
        lib/webrtc-8d2248ff/webrtc/common_audio/signal_processing/min_max_operations.c
        lib/webrtc-8d2248ff/webrtc/common_audio/signal_processing/resample_48khz.c
        lib/webrtc-8d2248ff/webrtc/common_audio/signal_processing/resample_by_2_internal.c
        lib/webrtc-8d2248ff/webrtc/common_audio/signal_processing/resample_fractional.c
        lib/webrtc-8d2248ff/webrtc/common_audio/signal_processing/spl_init.c
        lib/webrtc-8d2248ff/webrtc/common_audio/signal_processing/spl_inl.c
        lib/webrtc-8d2248ff/webrtc/common_audio/signal_processing/vector_scaling_operations.c
        lib/webrtc-8d2248ff/webrtc/common_audio/vad/vad_core.c
        lib/webrtc-8d2248ff/webrtc/common_audio/vad/vad_filterbank.c
        lib/webrtc-8d2248ff/webrtc/common_audio/vad/vad_gmm.c
        lib/webrtc-8d2248ff/webrtc/common_audio/vad/vad_sp.c
        lib/webrtc-8d2248ff/webrtc/common_audio/vad/webrtc_vad.c
        
         

        Last edit: Daniel Wolf 2016-06-21

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.