From: David v. L. <dav...@gm...> - 2015-06-18 09:18:43
|
Hello, We're using the nnet2-online setup in a CTS task. We have a good experience with the same setup for a BN task. However, for the CTS task, where utterances can be very short ("yes", "mmm", etc), and we observe a very strong dependence of the ivector length on duration (which makes sense) a very strong dependence of ASR performance on ivector length (which also makes sense). It seems that in the nnet2-online setup the ivectors are not normalized to length as is customary in speaker recognition. The nnet doesn't seem to like the duration dependence---what would be an approach to deal with this? Would it make sense to train the nnet with length-normalized ivectors? Cheers, ---david -- David van Leeuwen |