Re: [Kaldi-users] The utility of CMVN speaker adaptation

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> -----Original Message-----
> From: Daniel Povey [mailto:dp...@gm...]
> Sent: 2015-05-11 1158
> 
> > Many recipes use per-speaker adapted features. I am working in a
> field where we rarely see the same speaker again. To be able to compare
> apples to apples, I want to avoid the dependency on per-speaker
> adaptation. So one thing I want to do is drop utt <-> spk maps and use
> global LDA and per-sentence fMLLR adaptation only.
> 
> fMLLR applied globally does not really make sense, as there would be no
> adaptation going on.  I recommend to make the utt<->spk maps one-to-one
> (make the utterance-id and speaker-id identical).  You can see whether
> fMLLR helps, and drop it if not; or try basis fMLLR.

Right, that was what I meant by per-utterance fMMLR. I mistyped "per-sentence," too much NLP recently. :) 

> (Note: by
> default we don't enable variance normalization, so CMVN is a bit of a
> misnomer).

Thanks for pointing this out. I did not understand the V part was not there.

 -kkm