Menu

On the Pocketsphinx VAD and endpointer

Help
Mark
2009-05-04
2012-09-22
  • Mark

    Mark - 2009-05-04

    I'm wondering if the VAD/endpointer that pocketsphinx uses could be due for a similar upgrade to the one described in the following article so that it's performance and accuracy are enhanced in low-snr enviroments?

    http://figment.cse.usf.edu/~sfefilat/data/papers/WeBT5.3.pdf

    If so then what files need to be changed and more specifically what procedures/methods in those files.

    Mark.

     
    • Nickolay V. Shmyrev

      I'd better implement something like

      Javier Ramírez, José C. Segura, Carmen Benítez, Ángel de la Torre, Antonio Rubio;An Effective Subband OSF-Based VAD With Noise Reduction for Robust Speech Recognition. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 6, NOVEMBER 2005.

      http://ddtm.org.ru/misc/IE3SAP05b.pdf

      About place - pocketsphinx has vader gstreamer plugin component, you just need to rewrite it. In C source everything is done in /libsphinxad in sphinxbase.

       
      • Mark

        Mark - 2009-05-11

        I have a couple questions. First, in which files does the MFCC feature extraction (the pre-emphasis filter and FFT) occur in pocketshpinx? I believe the VAD should go before this. Second, is this why you mentioned, "In C source everything is done in /libsphinxad in sphinxbase" ?

         
        • Nickolay V. Shmyrev

          > I have a couple questions. First, in which files does the MFCC feature extraction (the pre-emphasis filter and FFT) occur in pocketshpinx?

          There is no feature extraction in pocketsphinx. sphinxbase is used instead. And as I said before you need to look at

          sphinxbase/src/libsphinxad/cont_ad_base.c

          for the place to implement VAD

           
    • Mark

      Mark - 2009-05-05

      Thanks Nickolay for that other article.
      In my naive view the one I mentioned seemed really good but I had nothing empirical to compare to except the Li algorithm from AT&T.
      Do you think the approach in the article you mentioned is better?
      Could a combined approach between the two produce something beyond both?

       
    • Mark

      Mark - 2009-05-05

      Yes the simplicity was the first thing I notice about the Li / Deng paper and the increase in speed that it provides as well. Nice in keeping pocketsphinx "lean and mean." The only thing I can't tell is how well they tested it compared to the paper you cite. Anyway, the compression rates they show beat AT&T Li's algorithm in accuracy across settings but there are rates and settings were AT&T's Li's beats Li /Deng on speed.

      I agree with you that this seems easiest to do. You get most "bang for your buck."

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.