From: Daniel P. <dp...@gm...> - 2015-03-01 00:06:23
|
Hi, I am cross-posting this to kaldi-developers as I think my reply might be of interest to people subscribed to that list. This is a good excuse to talk about the situation with Voice Activity Detection (VAD) more generally. There definitely does need to be some good voice activity detection in Kaldi at some point. Part of the reason why it doesn't exist yet is that it's never been clear to me that there is a "right" way to do VAD- or even a right way to formulate it as a problem. For example, how many classes should there be (music? laughter?); and what should be done about cross-talk and background speakers. And how does this all work in the online setting (e.g. is there a mechanism to reclassify previous speech as background if we get much louder speech?) Formulating it as a multi-class (speech/nonspeech) problem with neural nets does seem to be one of the most natural ways to set it up. However, I think it would make more sense to do this at the frame level rather than the segment level. Some of the issues involved in setting this up are a little complicated; for instance, it might be necessary to make changes to some of the command line tools so they don't require the transition-model and accept labels directly instead of alignments. Right now I'm working on extending the online-nnet2 setup to use the decoder backtrace to classify frames as silence or nonsilence, and use this to limit the iVector estimation to silence. This should at least remove the WER performance hit that we get from not having speech/silence detection in online decoding. In the past (e.g. for BABEL) we have done segmentation by doing a first pass of recognition using a fairly simple model and post-processing that output to create segments. Dan On Sat, Feb 28, 2015 at 8:05 AM, John Barnes <jcb...@gm...> wrote: > I'm interested in training a DNN voice activity detection system using > kaldi. I have a large corpus labeled at the segment level as speech and > nonspeech. Are there any existing recipes to do this or suggestions on how > to modify a recipe to accomplish this task? > > Thanks > > John > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |