From: dophist <do...@gm...> - 2015-01-22 16:11:40
|
Hi Daniel & Kaldi developers, I saw there was a thread about “WER of LSTM & DNN” in Kaldi sourceforge forum, I’m the author of the LSTM codes. This morning the thread creator Alim emailed me asking if I’d like to share my LSTM implementation with the Kaldi community, my answer is definitely a “yes” of course. The code is on github: https://github.com/dophist/kaldi-lstm 1. The implementation is under Karel's nnet1 framework. The whole LSTM architecture is condensed into a single configurable component. So in the forum thread, Daniel asked about the “external tool” Alim used, it’s actually “internal”, all Kaldi users will find it easy to compile and use. 2. There are two versions of my implementation, “standard” & “google”. The “standard” version can be seen as a general purpose LSTM tool with epoch-wise BPTT, you can even adapt it to train an LSTM-LM if you want, but currently I used it only for sequential training and decoding tool(nnet-forward). The “google” version is primarily used for Cross-Entropy training in my experiments. There are docs in my github repo with detailed descriptions. 3. Testing. The code has been tested as on an industry-size speech corpus around 4000+ hours that is not publicly available, my experiment reproduced google's results and their conclusions are solid. In the last few months I have got feedbacks from Siri group and Cambridge Lab and many others, I suppose they have already got similar results. 4. Legal stuff. Although I’m now working at Baidu, the coding is done in my personal spare time, so I have the freedom to make it open-sourced, under Kaldi’s license. Known issues: 1. Gradient explosion. Gradient explosion is far from solved in RNN training, gradient clipping seems to be the best practice from my own experience, it is implemented in “standard” version, but tuning the clipping threshold can be painful towards different tasks. “google” version is less likely to explode because they limit the BPTT expansion to 20, but explosion still exists in certain cases. 2. Training speed. Training LSTM is slow, indeed, especially when most institutes don’t have huge infrastructures like DistBelief at google. My current implementation is based on nnet1 so it only use 1 GPU card(or CPU), the training might take months to converge on industrial-size dataset. Multiple GPU cards in single host server won’t scale as the dataset is getting larger and larger. And parallelizing SGD on GPU cluster is still an open issue, most GPU cluster solutions I know requires InfiniBand network, Yann LeCun group’s EA-SGD seems most promising to me, but I don’t have time to try it. Daniel’s nnet2 averaging strategy can be another promising option but I can be sure if it will work on LSTM. These remaining issues (particularly training speedup) might require great effort to solve, and I’m not sure if I have enough time to do it. At least I hope my LSTM implementation can be a quick starting point towards RNN acoustic modeling for Kaldi community. if anyone have questions about the code, feel free to email me. jer...@gm... And since china gov occasionally blocks gmail, my back-up email address: jer...@qq... Best, Jerry (Jiayu DU) |
From: Daniel P. <dp...@gm...> - 2015-01-22 17:59:18
|
Karel, since Jerry is offering that we can use his nnet1 LSTM code in Kaldi, how do you feel about doing code review on it right now, since it's part of the nnet1 framework? If you don't have time right now, though, I could find someone else. Dan On Thu, Jan 22, 2015 at 11:11 AM, dophist <do...@gm...> wrote: > Hi Daniel & Kaldi developers, > > > > I saw there was a thread about “WER of LSTM & DNN” in Kaldi sourceforge > forum, I’m the author of the LSTM codes. This morning the thread creator > Alim emailed me asking if I’d like to share my LSTM implementation with the > Kaldi community, my answer is definitely a “yes” of course. > > > > The code is on github: https://github.com/dophist/kaldi-lstm > > > > 1. The implementation is under Karel's nnet1 framework. The whole > LSTM architecture is condensed into a single configurable component. So in > the forum thread, Daniel asked about the “external tool” Alim used, it’s > actually “internal”, all Kaldi users will find it easy to compile and use. > > > > 2. There are two versions of my implementation, “standard” & > “google”. The “standard” version can be seen as a general purpose LSTM tool > with epoch-wise BPTT, you can even adapt it to train an LSTM-LM if you > want, but currently I used it only for sequential training and decoding > tool(nnet-forward). The “google” version is primarily used for > Cross-Entropy training in my experiments. There are docs in my github repo > with detailed descriptions. > > > > 3. Testing. The code has been tested as on an industry-size speech > corpus around 4000+ hours that is not publicly available, my experiment > reproduced google's results and their conclusions are solid. In the last > few months I have got feedbacks from Siri group and Cambridge Lab and many > others, I suppose they have already got similar results. > > > > 4. Legal stuff. Although I’m now working at Baidu, the coding is > done in my personal spare time, so I have the freedom to make it > open-sourced, under Kaldi’s license. > > > > Known issues: > > 1. Gradient explosion. Gradient explosion is far from solved in RNN > training, gradient clipping seems to be the best practice from my own > experience, it is implemented in “standard” version, but tuning the > clipping threshold can be painful towards different tasks. “google” > version is less likely to explode because they limit the BPTT expansion to > 20, but explosion still exists in certain cases. > > > > 2. Training speed. Training LSTM is slow, indeed, especially when > most institutes don’t have huge infrastructures like DistBelief at google. > My current implementation is based on nnet1 so it only use 1 GPU card(or > CPU), the training might take months to converge on industrial-size > dataset. Multiple GPU cards in single host server won’t scale as the > dataset is getting larger and larger. And parallelizing SGD on GPU cluster > is still an open issue, most GPU cluster solutions I know requires > InfiniBand network, Yann LeCun group’s EA-SGD seems most promising to me, > but I don’t have time to try it. Daniel’s nnet2 averaging strategy can be > another promising option but I can be sure if it will work on LSTM. > > > > These remaining issues (particularly training speedup) might require great > effort to solve, and I’m not sure if I have enough time to do it. At least > I hope my LSTM implementation can be a quick starting point towards RNN > acoustic modeling for Kaldi community. > > > > if anyone have questions about the code, feel free to email me. > > jer...@gm... > > And since china gov occasionally blocks gmail, my back-up email address: > > jer...@qq... > > > > Best, > > Jerry (Jiayu DU) > > > ------------------------------------------------------------------------------ > New Year. New Location. New Benefits. New Data Center in Ashburn, VA. > GigeNET is offering a free month of service with a new server in Ashburn. > Choose from 2 high performing configs, both with 100TB of bandwidth. > Higher redundancy.Lower latency.Increased capacity.Completely compliant. > http://p.sf.net/sfu/gigenet > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |
From: Vesely K. <ve...@gm...> - 2015-01-22 20:01:42
|
Hi Jerry, yes, yes, that's a great idea, I'll happily look at it and thanks for the detailed description! Karel. On 01/22/2015 06:59 PM, Daniel Povey wrote: > Karel, since Jerry is offering that we can use his nnet1 LSTM code in > Kaldi, how do you feel about doing code review on it right now, since > it's part of the nnet1 framework? If you don't have time right now, > though, I could find someone else. > Dan > > > On Thu, Jan 22, 2015 at 11:11 AM, dophist <do...@gm... > <mailto:do...@gm...>> wrote: > > Hi Daniel & Kaldi developers, > > I saw there was a thread about “WER of LSTM & DNN” in Kaldi > sourceforge forum, I’m the author of the LSTM codes. This morning > the thread creator Alim emailed me asking if I’d like to share my > LSTM implementation with the Kaldi community, my answer is > definitely a “yes” of course. > > The code is on github: https://github.com/dophist/kaldi-lstm > > 1.The implementation is under Karel's nnet1 framework. The whole > LSTM architecture is condensed into a single configurable > component. So in the forum thread, Daniel asked about the > “external tool” Alim used, it’s actually “internal”, all Kaldi > users will find it easy to compile and use. > > 2.There are two versions of my implementation, “standard” & > “google”. The “standard” version can be seen as a general purpose > LSTM tool with epoch-wise BPTT, you can even adapt it to train an > LSTM-LM if you want, but currently I used it only for sequential > training and decoding tool(nnet-forward). The “google” version is > primarily used for Cross-Entropy training in my experiments. There > are docs in my github repo with detailed descriptions. > > 3.Testing. The code has been tested as on an industry-size speech > corpus around 4000+ hours that is not publicly available, my > experiment reproduced google's results and their conclusions are > solid. In the last few months I have got feedbacks from Siri > group and Cambridge Lab and many others, I suppose they have > already got similar results. > > 4.Legal stuff. Although I’m now working at Baidu, the coding is > done in my personal spare time, so I have the freedom to make it > open-sourced, under Kaldi’s license. > > Known issues: > > 1.Gradient explosion. Gradient explosion is far from solved in RNN > training, gradient clipping seems to be the best practice from my > own experience, it is implemented in “standard” version, but > tuning the clipping threshold can be painful towards different > tasks. “google” version is less likely to explode because they > limit the BPTT expansion to 20, but explosion still exists in > certain cases. > > 2.Training speed. Training LSTM is slow, indeed, especially when > most institutes don’t have huge infrastructures like DistBelief at > google. My current implementation is based on nnet1 so it only > use 1 GPU card(or CPU), the training might take months to converge > on industrial-size dataset. Multiple GPU cards in single host > server won’t scale as the dataset is getting larger and larger. > And parallelizing SGD on GPU cluster is still an open issue, most > GPU cluster solutions I know requires InfiniBand network, Yann > LeCun group’s EA-SGD seems most promising to me, but I don’t have > time to try it. Daniel’s nnet2 averaging strategy can be another > promising option but I can be sure if it will work on LSTM. > > These remaining issues (particularly training speedup) might > require great effort to solve, and I’m not sure if I have enough > time to do it. At least I hope my LSTM implementation can be a > quick starting point towards RNN acoustic modeling for Kaldi > community. > > if anyone have questions about the code, feel free to email me. > > jer...@gm... <mailto:jer...@gm...> > > And since china gov occasionally blocks gmail, my back-up email > address: > > jer...@qq... <mailto:jer...@qq...> > > Best, > > Jerry (Jiayu DU) > > > ------------------------------------------------------------------------------ > New Year. New Location. New Benefits. New Data Center in Ashburn, VA. > GigeNET is offering a free month of service with a new server in > Ashburn. > Choose from 2 high performing configs, both with 100TB of bandwidth. > Higher redundancy.Lower latency.Increased capacity.Completely > compliant. > http://p.sf.net/sfu/gigenet > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > <mailto:Kal...@li...> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |