From: Haihua Xu <hh...@gm...> - 2014-10-25 04:26:17
|
I agree with this :-) 1 The ultimate goal, I can think of. is a playable dialogue system in kaldi. To this, TTS module implementation is inevitable. 2 Besides, I am curious if there is a kaldi workshop in future, having more people voluntarily participate to learn, to develop specific modules. Thanks ! Haihua On Sat, Oct 25, 2014 at 1:05 AM, Matthew Aylett <mat...@gm...> wrote: > Finishing the kaldi TTS would be high up my list :-) > > Otherwise the main feedback I get from designers and HCI professionals > which I think the speech technology need to engage with, is the interactive > and portable nature of an ASR system. > > A lot of people I know are using pocket sphinx and I think a pocket kaldi > would be really useful. However that would mean a load of not very research > like work: > > i.e. Porting a realtime decoder to android/iOS, allowing arbitrary halting > and restarting (in response to feedback), and a simple means of making open > and closed language models which could be carried out by non-speech tech > engineers. > > Also some binary releases that don't require compilation (more for the > decoder than the training I would say), and some decent models built on > data like Librivox etc. which could be freely shared. I think I saw some > posts about work on this but it would be interesting to know what sort of > WER a non-technical person might get from such a freely available model. > For these people WER is not nearly as important as usability and > flexibility. > > Maybe some of this is already available, in which case a doc page or so > called Kaldi for Dummies would be really useful for non-ASR people like me. > > Matthew > > > > > > > > > > > On Fri, Oct 24, 2014 at 4:15 PM, Nagendra Goel <nag...@go... > > wrote: > >> I second the VAD proposal. I feel that something that >> >> 1) Looks at the “max” or “sum” of “ALL the silence phones >> likelihoods including NSN and BRH” likelihoods, and compares those with a >> configurable threshold (no decoding here) >> >> 2) Places a constraint on minimum (configurable) silence length >> >> 3) Shrinks the silence region boundaries by a (configurable) number >> of frames >> >> 4) Inverts the resulting silence frame results to get voice results >> >> >> >> Will suffice as first round of VAD. Compute is already small because we >> are not calculating all the likelihoods and not decoding anything. This >> could probably also be implemented in the online code. It will have the >> added benefits that online adaptation will not happen during these silence >> portions. It is important to squelch adaptation completely during long >> portions of silences - like when two people are conversing (ASR listening >> to only one channel). >> >> >> >> I also have been using modified scripts for constructing G.fst that >> allow construction of grammars for Dates and numbers and the like, while >> its regular trigram LM for everything else. The arks corresponding to dates >> and numbers are replaced with this grammars. However this workflow probably >> needs more research, especially when dealing with multiple LM data sources >> and interpolation of LMs. I wonder if this is a Kaldi topic, or somebody’s >> research topic. >> >> >> >> *From:* Ondrej Platek [mailto:ond...@gm...] >> *Sent:* Friday, October 24, 2014 7:03 AM >> *To:* Daniel Povey >> *Cc:* kal...@li... >> *Subject:* Re: [Kaldi-users] Poll about Kaldi >> >> >> >> Dear Dan, >> >> >> >> My wishlist: >> >> 1) VAD >> >> 2) easy framework for experimenting with DNN. >> >> 3) Standardize Kaldi special semirings (CompactLattice, Lattice) so the >> standard OpenFst tools would be more usable (or extend the tools) >> >> >> >> Detail description: >> >> >> >> Regarding VAD >> >> ============ >> >> I will try to describe the motivation why I want to implement VAD in >> Kaldi. >> >> >> >> The two reasons why I need VAD in general: >> >> a) VAD in our dialogue systems separates turns. After the detected speech >> finishes we consider doing turn >> >> => VAD is useful in dialogue systems >> >> b) VAD is simpler component than the decoder and its RTF (real time >> factor) should be significantly smaller. >> >> In production setup with thousands of channels it safe a lot of >> computation if VAD RTF is under 0.1 (easy to achieve) >> >> => VAD is useful in production >> >> >> >> The reason why I would like to implement VAD in Kaldi: >> >> 1) The VAD would work on top of features (e.g. mfcc, fbanks,..) and I >> want to reuse them, so I want use Kaldi code for computing the mfcc features >> >> 2) The the "recognizer" which would use one of the Kaldi decoders and in >> the preprocessing steps use VAD should integrate them seamlessly: >> >> The idea: the frame numbering should be kept the same, and the silence >> frames should be marked as silence (mainly) by the VAD. >> >> 3) The evaluation of the recogniser with additional VAD component will be >> done exactly as for the current non VAD recognizers e.g. gmm-latgen-faster, >> mapped-latgen-faster. >> >> The "sil" words will not be counted as errors, and the goal will be >> obviously to keep the same WER but reduce the RTF. >> >> >> >> Note: We have VAD in Python using DNN and GMM (DNN is much better) and >> >> I am willing to spend MY TIME on implementing this in Kaldi. >> >> >> >> >> >> Regarding toolkit for DNN experiments >> >> ================================ >> >> I want to use DNN models since they seems to the best and try to >> different easily: >> >> This toolkit seems like a choice for me right now: >> >> http://www.cs.cmu.edu/~ymiao/pdnntk.html >> >> If the author will be keeping this compatible with Kaldi or if it will be >> integrated into Kaldi >> >> it would be very helpful for doing easy experiments. >> >> >> >> >> >> OpenFST >> >> ======= >> >> It is rather technical wish but I was planning to extend pyfst (Python >> wrapper) for Kaldi semirings >> >> so I can easily work with the ASR hypothesis, so far I am only using the >> OpenFST tropical and log semiring >> >> for word lattices in https://github.com/UFAL-DSG/pyfst >> >> Note: I do not have time for that right now, but I think this would be >> useful. >> >> >> >> Thank you for the poll >> >> >> >> On 24 October 2014 01:29, Daniel Povey <dp...@gm...> wrote: >> >> Hey everyone, >> >> >> >> I'm thinking of creating a poll to ask people what things they would like >> improved about Kaldi. >> >> I'm sending out this email to get ideas about what we could include in >> the poll. >> >> I'm deliberately not giving you guys examples of what kind of thing I >> expect, because I don't want to artificially limit the scope at this >> point. I'm not necessarily just talking about features - it can encompass >> wider wishes and concerns. >> >> Any ideas? >> >> >> >> >> >> Dan >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |