|
From: <Dan...@pa...> - 2015-01-07 03:43:33
|
Thanks! I won't have to spend a lot of time fretting about the relative merits of Kaldi and whatever ASR system pops up next. Your outline for using Kaldi's decoder is greatly appreciated. The Hydra folks claim that there's significant value in having a language model that's so large that it's unreasonable to incorporate it into the HCLG WFST. However, figures 3 and 4 in http://www.cs.cmu.edu/~ianlane/publications/2012_Kim_Interspeech.pdf appear to show that a standard bigram language model achieves comparable accuracy with a real time factor that's still comfortably under 1. Is that a fair conclusion? I'm trying to set expectations in my organization. With a Word Accuracy of 95%, the user still has to fix every 20th word on average. That's about one error in each line of text. Does it seem likely that someone will make substantial improvements in accuracy (like 99%) with an online decoder that has a Real Time Factor that's less than 1? With any Real Time Factor? Dan -----Original Message----- From: Daniel Povey [mailto:dp...@gm...] Sent: Tuesday, January 06, 2015 4:37 PM To: Davies, Dan <Dan...@pa...> Cc: kal...@li... Subject: Re: [Kaldi-developers] Kaldi comparison with Hydra? Interestingly, Hydra is what I wanted to call the Kaldi project (I was outvoted). It's not really possible to compare the two. Hydra is a closed-source decoder, and it's only a decoder, it doesn't have a system for building models like Kaldi does. I would imagine that the Hydra decoder is faster, since they've obviously put a lot of effort into it, but the online-nnet2 decoder is sufficiently fast, in that you can get it to decode in real-time fairly easily, without much loss in accuracy by using suitable beams and a large enough chunk-size (e.g. 20 frames), and by configuring your matrix library (ATLAS, OpenBlas, MKL) to use, say, 2 threads. Although it would be very easy to use GPUs for the neural net part of the computation, there hasn't been much demand for it because if you can decode in real-time using a couple of cores of CPU, it'll generally be more efficient in terms of hardware cost than using one core of CPU, plus a GPU. Note that the online-nnet2 decoder is not really a decoder per se, it just calls the standard decoding code in lattice-faster-decoder.h, which isn't that complicated; but the online-nnet2 code takes care of various online feature estimation issues and of batching up the features into suitable size chunks so that matrix operations in the neural net code will be fast. Dan On Tue, Jan 6, 2015 at 4:02 PM, <Dan...@pa...> wrote: > CMU’s Hydra ASR decoder made a splash out here. From the references > below (or any other info you can find), does anyone have a feeling for > how this compares with the Kaldi nnet2 online decoder in speed and accuracy? > > > > http://www.cs.cmu.edu/~ianlane/publications/2012_Kim_Interspeech.pdf > > http://on-demand.gputechconf.com/gtc/2013/presentations/S3406-HYDRA-Hy > brid-CPU-GPU-Speech-Recognition-Engine.pdf > > http://www.cs.cmu.edu/~ianlane/publications/SLT_JungsukKim.pdf > > http://www.nvidia.com/content/cuda/spotlights/ian-lane-cmu.html > > > > Dan > > > > > ---------------------------------------------------------------------- > -------- Dive into the World of Parallel Programming! The Go Parallel > Website, sponsored by Intel and developed in partnership with Slashdot > Media, is your hub for all things parallel software development, from > weekly thought leadership blogs to news, videos, case studies, > tutorials and more. Take a look and join the conversation now. > http://goparallel.sourceforge.net > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |