Re: [Kaldi-users] nnet2-online i-vector sensibility with short utterances

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I think it would make sense. Would you like to contribute that to the
recipe.

-----Original Message-----
From: David van Leeuwen [mailto:dav...@gm...] 
Sent: Thursday, June 18, 2015 5:18 AM
To: kal...@li...
Subject: [Kaldi-users] nnet2-online i-vector sensibility with short
utterances

Hello,

We're using the nnet2-online setup in a CTS task.  We have a good experience
with the same setup for a BN task.  However, for the CTS task, where
utterances can be very short ("yes", "mmm", etc), and we observe a very
strong dependence of the ivector length on duration (which makes sense) a
very strong dependence of ASR performance on ivector length (which also
makes sense).

It seems that in the nnet2-online setup the ivectors are not normalized to
length as is customary in speaker recognition.  The nnet doesn't seem to
like the duration dependence---what would be an approach to deal with this?
Would it make sense to train the nnet with length-normalized ivectors?

Cheers,

---david

--
David van Leeuwen

----------------------------------------------------------------------------
--
_______________________________________________
Kaldi-users mailing list
Kal...@li...
https://lists.sourceforge.net/lists/listinfo/kaldi-users