|
From: Mate A. <ele...@gm...> - 2015-06-25 15:09:13
|
l am going to train a deep neural net model with "multi-splice" using the LibriSpeech dataset with the local/online/run_nnet2_ms.sh <https://goo.gl/72A2Zx> script included in Kaldi's repository, which I think will give the best resulting WER. The end goal is to use the trained model in this phase for initializing a next model to train and do forced alignment on Blizzard2013 <http://www.synsig.org/images/b/b1/Blizzard2013.pdf> dataset, specifically the 2013-EH2 subset including 1 female speaker, 19 hours of speech and sentence-level alignments. I don't have much of experience with Kaldi and my questions are: 1. How long does it take to train on all (960hrs) of Librispeech on a GPU (say GTX TITAN X or K6000)? Even a rough estimate could be useful. 2. Is there anything to take into account before training on Librispeech? 3. And more importantly, how should I initialize/train the next model for the Blizzard2013 dataset? I managed to go through data preparation for that and created the necessary files. |