|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-07-29 22:19:59
|
I was curious likely logical next-steps for improving training / decoding might be give other user's experience. It does very well with significant background noise, but I think that there is definitely room to do better. 1 - I have a training set of around 5K words, though I could bring that up to around 20K 2 - I am using the kaldi_lm, though I could use SRILM . . not sure if it would necessarily improve results 3 - I am decoding about 1 minute of text, though training data is in 10 second epochs. I can mix some of the test data in if that would help. 4 - When I am training deltas I use a very small # of leaves / gauss (100 / 1000) to get the best results. The best results are with tri1. Further training yields worse results. 5 - I use the same lexicon for the training and decoding (though a more restrictive language model for decoding). Any help / thoughts are appreciated. Thanks, Nathan |