Re: [Kaldi-users] training recommendataions

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

There's no special thing we do for this.  Just play with the #leaves
and #Gaussians.
Dan

On Mon, Jul 29, 2013 at 6:38 PM, Nathan Dunn <nd...@ca...> wrote:
>
> Thanks.
>
> It's unusual that the later stages of training are not better.
> Normally you get a substantial improvement.
>
>
> I wonder if this is due to the very small amount of my training data.
>
> Is there a recommended recipe that I should follow for this type of data
> (20K in training data, decoding 1 min long passages)?  I tried to use swbd,
> but ended up going back to using the settings that more closely matched
> resource management.
>
>
> Nathan
>
> On Jul 29, 2013, at 3:27 PM, Daniel Povey wrote:
>
> 1 - I have a training set of around 5K words, though I could bring that up
>
> to around 20K
>
>
> More language model training data will definitely help.
>
> 2 - I am using the kaldi_lm, though I could use SRILM . . not sure if it
>
> would necessarily improve results
>
>
> Probably would make no difference-- more a usability issue.
>
> 3 - I am decoding about 1 minute of text, though training data is in 10
>
> second epochs.  I can mix some of the test data in if that would help.
>
>
> It's not considered good form to mix the test data in with training--
> this will give you unrealistically good results.
>
> 4 - When I am training deltas I use a very small # of leaves / gauss (100 /
>
> 1000) to get the best results.   The best results are with tri1.  Further
>
> training yields worse results.
>
>
> It's unusual that the later stages of training are not better.
> Normally you get a substantial improvement.
>
> Dan
>
> 5 - I use the same lexicon for the training and decoding (though a more
>
> restrictive language model for decoding).
>
>
> Any help / thoughts are appreciated.
>
>
> Thanks,
>
>
> Nathan
>
>
>
>