From: Nagendra G. <nag...@go...> - 2015-06-18 14:00:21
|
I think it would make sense. Would you like to contribute that to the recipe. -----Original Message----- From: David van Leeuwen [mailto:dav...@gm...] Sent: Thursday, June 18, 2015 5:18 AM To: kal...@li... Subject: [Kaldi-users] nnet2-online i-vector sensibility with short utterances Hello, We're using the nnet2-online setup in a CTS task. We have a good experience with the same setup for a BN task. However, for the CTS task, where utterances can be very short ("yes", "mmm", etc), and we observe a very strong dependence of the ivector length on duration (which makes sense) a very strong dependence of ASR performance on ivector length (which also makes sense). It seems that in the nnet2-online setup the ivectors are not normalized to length as is customary in speaker recognition. The nnet doesn't seem to like the duration dependence---what would be an approach to deal with this? Would it make sense to train the nnet with length-normalized ivectors? Cheers, ---david -- David van Leeuwen ---------------------------------------------------------------------------- -- _______________________________________________ Kaldi-users mailing list Kal...@li... https://lists.sourceforge.net/lists/listinfo/kaldi-users |