Re: [Kaldi-developers] Kaldi-LSTM

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Jerry,
yes, yes, that's a great idea, I'll happily look at it and thanks for 
the detailed description!
Karel.

On 01/22/2015 06:59 PM, Daniel Povey wrote:
> Karel, since Jerry is offering that we can use his nnet1 LSTM code in 
> Kaldi, how do you feel about doing code review on it right now, since 
> it's part of the nnet1 framework? If you don't have time right now, 
> though, I could find someone else.
> Dan
>
>
> On Thu, Jan 22, 2015 at 11:11 AM, dophist <do...@gm... 
> <mailto:do...@gm...>> wrote:
>
>     Hi Daniel & Kaldi developers,
>
>     I saw there was a thread about “WER of LSTM & DNN” in Kaldi
>     sourceforge forum, I’m the author of the LSTM codes.  This morning
>     the thread creator Alim emailed me asking if I’d like to share my
>     LSTM implementation with the Kaldi community, my answer is
>     definitely a “yes” of course.
>
>     The code is on github: https://github.com/dophist/kaldi-lstm
>
>     1.The implementation is under Karel's nnet1 framework. The whole
>     LSTM architecture is condensed into a single configurable
>     component. So in the forum thread, Daniel asked about the
>     “external tool” Alim used, it’s actually “internal”, all Kaldi
>     users will find it easy to compile and use.
>
>     2.There are two versions of my implementation, “standard” &
>     “google”. The “standard” version can be seen as a general purpose
>     LSTM tool with epoch-wise BPTT, you can even adapt it to train an
>     LSTM-LM if you want, but currently I used it only for sequential
>     training and decoding tool(nnet-forward). The “google” version is
>     primarily used for Cross-Entropy training in my experiments. There
>     are docs in my github repo with detailed descriptions.
>
>     3.Testing. The code has been tested as on an industry-size speech
>     corpus around 4000+ hours that is not publicly available, my
>     experiment reproduced google's results and their conclusions are
>     solid.  In the last few months I have got feedbacks from Siri
>     group and Cambridge Lab and many others, I suppose they have
>     already got similar results.
>
>     4.Legal stuff. Although I’m now working at Baidu, the coding is
>     done in my personal spare time, so I have the freedom to make it
>     open-sourced, under Kaldi’s license.
>
>     Known issues:
>
>     1.Gradient explosion. Gradient explosion is far from solved in RNN
>     training, gradient clipping seems to be the best practice from my
>     own experience, it is implemented in “standard” version, but
>     tuning the clipping threshold can be painful towards different
>     tasks.  “google” version is less likely to explode because they
>     limit the BPTT expansion to 20, but explosion still exists in
>     certain cases.
>
>     2.Training speed. Training LSTM is slow, indeed, especially when
>     most institutes don’t have huge infrastructures like DistBelief at
>     google.  My current implementation is based on nnet1 so it only
>     use 1 GPU card(or CPU), the training might take months to converge
>     on industrial-size dataset.  Multiple GPU cards in single host
>     server won’t scale as the dataset is getting larger and larger.
>     And parallelizing SGD on GPU cluster is still an open issue, most
>     GPU cluster solutions I know requires InfiniBand network, Yann
>     LeCun group’s EA-SGD seems most promising to me, but I don’t have
>     time to try it.  Daniel’s nnet2 averaging strategy can be another
>     promising option but I can be sure if it will work on LSTM.
>
>     These remaining issues (particularly training speedup) might
>     require great effort to solve, and I’m not sure if I have enough
>     time to do it. At least I hope my LSTM implementation can be a
>     quick starting point towards RNN acoustic modeling for Kaldi
>     community.
>
>     if anyone have questions about the code, feel free to email me.
>
>     jer...@gm... <mailto:jer...@gm...>
>
>     And since china gov occasionally blocks gmail, my back-up email
>     address:
>
>     jer...@qq... <mailto:jer...@qq...>
>
>     Best,
>
>     Jerry (Jiayu DU)
>
>
>     ------------------------------------------------------------------------------
>     New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
>     GigeNET is offering a free month of service with a new server in
>     Ashburn.
>     Choose from 2 high performing configs, both with 100TB of bandwidth.
>     Higher redundancy.Lower latency.Increased capacity.Completely
>     compliant.
>     http://p.sf.net/sfu/gigenet
>     _______________________________________________
>     Kaldi-developers mailing list
>     Kal...@li...
>     <mailto:Kal...@li...>
>     https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>
>