Re: [Kaldi-developers] Split data in karel nnet/train_mpe

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Karel's setup inherently uses quite a lot of memory because it stores all
the data in memory.  You could try splitting the data as you said, but I
think it would be more ideal if Karel were to add something to the script
to support this kind of thing in the "proper" way.
Alternately, try running the nnet2 setup in local/online/run_nnet2_ms.sh
which does not have this problem; you would have to set --num-jobs-initial
1 --num-jobs-final 1 if you only have one GPU.

If you do want to modify Karel's setup to use random subsets of the
features, I suggest adding something of the following form to the end of
the feature pipeline:
subset-feats --include=foo/bar/random-utt-subset.iteration ark:- ark:-
where random-utt-subset.1 random-utt-subset.2 and so on are utterance lists
computed in advance.

Dan

On Sat, Feb 28, 2015 at 2:31 PM, Raymond W. M. Ng <wm...@sh...>
wrote:

> Hi Kaldi,
>
> I am training an DNN with Karel setupt on a 160hr data set.
> When I get to the sMBR sequence discriminative training
> (steps/nnet/train_mpe.sh) The memory usage exploded. The program only
> managed to process around 2/7 of the training files before it crashes.
>
> There's no easy accumulation function for the DNN but I assume I can just
> put different training file splits in consecutive iterations?
>
> I'd like to know if there's resource out there already. I was referring to
> the egs/tedlium recipe.
>
> thanks
> raymond
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Kaldi-developers mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>
>