I'm wondering if the VAD/endpointer that pocketsphinx uses could be due for a similar upgrade to the one described in the following article so that it's performance and accuracy are enhanced in low-snr enviroments?
Javier Ramírez, José C. Segura, Carmen Benítez, Ángel de la Torre, Antonio Rubio;An Effective Subband OSF-Based VAD With Noise Reduction for Robust Speech Recognition. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 6, NOVEMBER 2005.
About place - pocketsphinx has vader gstreamer plugin component, you just need to rewrite it. In C source everything is done in /libsphinxad in sphinxbase.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a couple questions. First, in which files does the MFCC feature extraction (the pre-emphasis filter and FFT) occur in pocketshpinx? I believe the VAD should go before this. Second, is this why you mentioned, "In C source everything is done in /libsphinxad in sphinxbase" ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks Nickolay for that other article.
In my naive view the one I mentioned seemed really good but I had nothing empirical to compare to except the Li algorithm from AT&T.
Do you think the approach in the article you mentioned is better?
Could a combined approach between the two produce something beyond both?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes the simplicity was the first thing I notice about the Li / Deng paper and the increase in speed that it provides as well. Nice in keeping pocketsphinx "lean and mean." The only thing I can't tell is how well they tested it compared to the paper you cite. Anyway, the compression rates they show beat AT&T Li's algorithm in accuracy across settings but there are rates and settings were AT&T's Li's beats Li /Deng on speed.
I agree with you that this seems easiest to do. You get most "bang for your buck."
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm wondering if the VAD/endpointer that pocketsphinx uses could be due for a similar upgrade to the one described in the following article so that it's performance and accuracy are enhanced in low-snr enviroments?
http://figment.cse.usf.edu/~sfefilat/data/papers/WeBT5.3.pdf
If so then what files need to be changed and more specifically what procedures/methods in those files.
Mark.
I'd better implement something like
Javier Ramírez, José C. Segura, Carmen Benítez, Ángel de la Torre, Antonio Rubio;An Effective Subband OSF-Based VAD With Noise Reduction for Robust Speech Recognition. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 6, NOVEMBER 2005.
http://ddtm.org.ru/misc/IE3SAP05b.pdf
About place - pocketsphinx has vader gstreamer plugin component, you just need to rewrite it. In C source everything is done in /libsphinxad in sphinxbase.
I have a couple questions. First, in which files does the MFCC feature extraction (the pre-emphasis filter and FFT) occur in pocketshpinx? I believe the VAD should go before this. Second, is this why you mentioned, "In C source everything is done in /libsphinxad in sphinxbase" ?
> I have a couple questions. First, in which files does the MFCC feature extraction (the pre-emphasis filter and FFT) occur in pocketshpinx?
There is no feature extraction in pocketsphinx. sphinxbase is used instead. And as I said before you need to look at
sphinxbase/src/libsphinxad/cont_ad_base.c
for the place to implement VAD
Thanks Nickolay for that other article.
In my naive view the one I mentioned seemed really good but I had nothing empirical to compare to except the Li algorithm from AT&T.
Do you think the approach in the article you mentioned is better?
Could a combined approach between the two produce something beyond both?
Yes the simplicity was the first thing I notice about the Li / Deng paper and the increase in speed that it provides as well. Nice in keeping pocketsphinx "lean and mean." The only thing I can't tell is how well they tested it compared to the paper you cite. Anyway, the compression rates they show beat AT&T Li's algorithm in accuracy across settings but there are rates and settings were AT&T's Li's beats Li /Deng on speed.
I agree with you that this seems easiest to do. You get most "bang for your buck."