From: Ondrej P. <ond...@gm...> - 2014-10-24 11:03:41
|
Dear Dan, My wishlist: 1) VAD 2) easy framework for experimenting with DNN. 3) Standardize Kaldi special semirings (CompactLattice, Lattice) so the standard OpenFst tools would be more usable (or extend the tools) Detail description: Regarding VAD ============ I will try to describe the motivation why I want to implement VAD in Kaldi. The two reasons why I need VAD in general: a) VAD in our dialogue systems separates turns. After the detected speech finishes we consider doing turn => VAD is useful in dialogue systems b) VAD is simpler component than the decoder and its RTF (real time factor) should be significantly smaller. In production setup with thousands of channels it safe a lot of computation if VAD RTF is under 0.1 (easy to achieve) => VAD is useful in production The reason why I would like to implement VAD in Kaldi: 1) The VAD would work on top of features (e.g. mfcc, fbanks,..) and I want to reuse them, so I want use Kaldi code for computing the mfcc features 2) The the "recognizer" which would use one of the Kaldi decoders and in the preprocessing steps use VAD should integrate them seamlessly: The idea: the frame numbering should be kept the same, and the silence frames should be marked as silence (mainly) by the VAD. 3) The evaluation of the recogniser with additional VAD component will be done exactly as for the current non VAD recognizers e.g. gmm-latgen-faster, mapped-latgen-faster. The "sil" words will not be counted as errors, and the goal will be obviously to keep the same WER but reduce the RTF. Note: We have VAD in Python using DNN and GMM (DNN is much better) and I am willing to spend MY TIME on implementing this in Kaldi. Regarding toolkit for DNN experiments ================================ I want to use DNN models since they seems to the best and try to different easily: This toolkit seems like a choice for me right now: http://www.cs.cmu.edu/~ymiao/pdnntk.html If the author will be keeping this compatible with Kaldi or if it will be integrated into Kaldi it would be very helpful for doing easy experiments. OpenFST ======= It is rather technical wish but I was planning to extend pyfst (Python wrapper) for Kaldi semirings so I can easily work with the ASR hypothesis, so far I am only using the OpenFST tropical and log semiring for word lattices in https://github.com/UFAL-DSG/pyfst Note: I do not have time for that right now, but I think this would be useful. Thank you for the poll On 24 October 2014 01:29, Daniel Povey <dp...@gm...> wrote: > Hey everyone, > > I'm thinking of creating a poll to ask people what things they would like > improved about Kaldi. > I'm sending out this email to get ideas about what we could include in the > poll. > I'm deliberately not giving you guys examples of what kind of thing I > expect, because I don't want to artificially limit the scope at this > point. I'm not necessarily just talking about features - it can encompass > wider wishes and concerns. > Any ideas? > > > Dan > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |