From: Vesely K. <ve...@gm...> - 2015-01-22 20:01:42
|
Hi Jerry, yes, yes, that's a great idea, I'll happily look at it and thanks for the detailed description! Karel. On 01/22/2015 06:59 PM, Daniel Povey wrote: > Karel, since Jerry is offering that we can use his nnet1 LSTM code in > Kaldi, how do you feel about doing code review on it right now, since > it's part of the nnet1 framework? If you don't have time right now, > though, I could find someone else. > Dan > > > On Thu, Jan 22, 2015 at 11:11 AM, dophist <do...@gm... > <mailto:do...@gm...>> wrote: > > Hi Daniel & Kaldi developers, > > I saw there was a thread about “WER of LSTM & DNN” in Kaldi > sourceforge forum, I’m the author of the LSTM codes. This morning > the thread creator Alim emailed me asking if I’d like to share my > LSTM implementation with the Kaldi community, my answer is > definitely a “yes” of course. > > The code is on github: https://github.com/dophist/kaldi-lstm > > 1.The implementation is under Karel's nnet1 framework. The whole > LSTM architecture is condensed into a single configurable > component. So in the forum thread, Daniel asked about the > “external tool” Alim used, it’s actually “internal”, all Kaldi > users will find it easy to compile and use. > > 2.There are two versions of my implementation, “standard” & > “google”. The “standard” version can be seen as a general purpose > LSTM tool with epoch-wise BPTT, you can even adapt it to train an > LSTM-LM if you want, but currently I used it only for sequential > training and decoding tool(nnet-forward). The “google” version is > primarily used for Cross-Entropy training in my experiments. There > are docs in my github repo with detailed descriptions. > > 3.Testing. The code has been tested as on an industry-size speech > corpus around 4000+ hours that is not publicly available, my > experiment reproduced google's results and their conclusions are > solid. In the last few months I have got feedbacks from Siri > group and Cambridge Lab and many others, I suppose they have > already got similar results. > > 4.Legal stuff. Although I’m now working at Baidu, the coding is > done in my personal spare time, so I have the freedom to make it > open-sourced, under Kaldi’s license. > > Known issues: > > 1.Gradient explosion. Gradient explosion is far from solved in RNN > training, gradient clipping seems to be the best practice from my > own experience, it is implemented in “standard” version, but > tuning the clipping threshold can be painful towards different > tasks. “google” version is less likely to explode because they > limit the BPTT expansion to 20, but explosion still exists in > certain cases. > > 2.Training speed. Training LSTM is slow, indeed, especially when > most institutes don’t have huge infrastructures like DistBelief at > google. My current implementation is based on nnet1 so it only > use 1 GPU card(or CPU), the training might take months to converge > on industrial-size dataset. Multiple GPU cards in single host > server won’t scale as the dataset is getting larger and larger. > And parallelizing SGD on GPU cluster is still an open issue, most > GPU cluster solutions I know requires InfiniBand network, Yann > LeCun group’s EA-SGD seems most promising to me, but I don’t have > time to try it. Daniel’s nnet2 averaging strategy can be another > promising option but I can be sure if it will work on LSTM. > > These remaining issues (particularly training speedup) might > require great effort to solve, and I’m not sure if I have enough > time to do it. At least I hope my LSTM implementation can be a > quick starting point towards RNN acoustic modeling for Kaldi > community. > > if anyone have questions about the code, feel free to email me. > > jer...@gm... <mailto:jer...@gm...> > > And since china gov occasionally blocks gmail, my back-up email > address: > > jer...@qq... <mailto:jer...@qq...> > > Best, > > Jerry (Jiayu DU) > > > ------------------------------------------------------------------------------ > New Year. New Location. New Benefits. New Data Center in Ashburn, VA. > GigeNET is offering a free month of service with a new server in > Ashburn. > Choose from 2 high performing configs, both with 100TB of bandwidth. > Higher redundancy.Lower latency.Increased capacity.Completely > compliant. > http://p.sf.net/sfu/gigenet > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > <mailto:Kal...@li...> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |