Re: [Kaldi-users] non-cluster usage of Librispeech s5 recipe?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Kirill,

Many thanks for the pointers. On your setup, how long does the entire
recipe take without decoding?

For the life of me I can't figure out where num_jobs_nnet is being set
(it's being written in the egs_dir as 4, I've changed it everywhere I
could find it.)

On Fri, Jun 12, 2015 at 7:00 PM, Kirill Katsnelson
<kir...@sm...> wrote:
>> From: David Warde-Farley [mailto:d.w...@gm...]
>> Subject: [Kaldi-users] non-cluster usage of Librispeech s5 recipe?
>>
>> I'm trying to
>> use the s5 recipe for LibriSpeech on a single machine with a single
>> GPU. I've modified cmd.sh to use run.pl.
>
> I ran it on a single machine, it requires a few modifications. Note that it took almost a week on a 6-core 4.1GHz overclocked i7-5930K CPU and GeForce 980 to train on the 500 hour set.
>
>> After about a day, I see a lot of background processes like gmm-latgen-
>> faster, lattice-add-penalty, lattice-scale, etc. that have been
>> launched in the background (the terminal is actually free, which
>> suggests the run.sh script has terminated...). I'm not totally sure
>> what's going on, or how to find out.
>
> In librispeech/s5/run.sh, look for decode commands in subshells, like
>
> (
>    utils/mkgraph.sh data/lang_nosp_test_tgsmall \
>      exp/tri4b exp/tri4b/graph_nosp_tgsmall || exit 1;
>   for test in test_clean test_other dev_clean dev_other; do
>     steps/decode_fmllr.sh --nj 20 --cmd "$decode_cmd" \
>   . . .
> )&
>
> These decodes are quite slow, if you run them on your machine. They are slower than other part of the script. In the end, they are accumulating, eating CPU and blowing up out of memory. They are not essential for NN training, except possibly for the mkgraph script. The results are useful to check if you are getting expected WER, but really not essential. You may either disable these decode blocks completely (except mkgraph invocations) or remove the '&' at the end to run them synchronously. NB they will take the most preparation time prior to NN training step. Dunno about your machine but give it an extra couple days to complete with these.
>
>> One thing I noticed earlier is that the script was trying to spawn
>> multiple GPU jobs, but this GPU is configured (by administrators) to
>> permit at most one CUDA process, and so I saw "3 of 4 jobs failed"
>> messages. Would these jobs have been retried?
>
> They will not, but you can restart NN training from the last step. Modify local/online/run_nnet2_ms.sh so that steps/nnet2/train_multisplice_accel2.sh is invoked with switches "--num-jobs-initial 1 --num-jobs-final 1" (the defaults are larger). When running local/online/run_nnet2_ms.sh, pass it "--stage 7" (this is the default) and "--train_stage N" the number of iteration you are restarting from.
>
> Even if not the 1 job limit, you probably won't benefit from running more than 1 at a time.
>
>  -kkm

On Fri, Jun 12, 2015 at 4:00 PM, Kirill Katsnelson
<kir...@sm...> wrote:
>> From: David Warde-Farley [mailto:d.w...@gm...]
>> Subject: [Kaldi-users] non-cluster usage of Librispeech s5 recipe?
>>
>> I'm trying to
>> use the s5 recipe for LibriSpeech on a single machine with a single
>> GPU. I've modified cmd.sh to use run.pl.
>
> I ran it on a single machine, it requires a few modifications. Note that it took almost a week on a 6-core 4.1GHz overclocked i7-5930K CPU and GeForce 980 to train on the 500 hour set.
>
>> After about a day, I see a lot of background processes like gmm-latgen-
>> faster, lattice-add-penalty, lattice-scale, etc. that have been
>> launched in the background (the terminal is actually free, which
>> suggests the run.sh script has terminated...). I'm not totally sure
>> what's going on, or how to find out.
>
> In librispeech/s5/run.sh, look for decode commands in subshells, like
>
> (
>    utils/mkgraph.sh data/lang_nosp_test_tgsmall \
>      exp/tri4b exp/tri4b/graph_nosp_tgsmall || exit 1;
>   for test in test_clean test_other dev_clean dev_other; do
>     steps/decode_fmllr.sh --nj 20 --cmd "$decode_cmd" \
>   . . .
> )&
>
> These decodes are quite slow, if you run them on your machine. They are slower than other part of the script. In the end, they are accumulating, eating CPU and blowing up out of memory. They are not essential for NN training, except possibly for the mkgraph script. The results are useful to check if you are getting expected WER, but really not essential. You may either disable these decode blocks completely (except mkgraph invocations) or remove the '&' at the end to run them synchronously. NB they will take the most preparation time prior to NN training step. Dunno about your machine but give it an extra couple days to complete with these.
>
>> One thing I noticed earlier is that the script was trying to spawn
>> multiple GPU jobs, but this GPU is configured (by administrators) to
>> permit at most one CUDA process, and so I saw "3 of 4 jobs failed"
>> messages. Would these jobs have been retried?
>
> They will not, but you can restart NN training from the last step. Modify local/online/run_nnet2_ms.sh so that steps/nnet2/train_multisplice_accel2.sh is invoked with switches "--num-jobs-initial 1 --num-jobs-final 1" (the defaults are larger). When running local/online/run_nnet2_ms.sh, pass it "--stage 7" (this is the default) and "--train_stage N" the number of iteration you are restarting from.
>
> Even if not the 1 job limit, you probably won't benefit from running more than 1 at a time.
>
>  -kkm