From: Nagendra G. <nag...@go...> - 2015-07-22 12:32:18
|
I can only tell from experience that iVector adaptation affects a word or two (Significantly) at the most, and adapts reasonably by that time. So if 5-6 utterances are affected, the problem may be somewhere else. Try shuffling the order of decode (offline of course) and see if you find a pattern. Nagendra On Wed, Jul 22, 2015 at 8:26 AM, Amit Beka <ami...@gm...> wrote: > I have listened to the recordings themselves (after VAD), and they all > sound good, and were recorded with the same background noise (almost none) > with the same speaker and in the same volume. > > I use the nnet2-online-latgen-faster decoder, and although my LM doesn't > really suits the input, I expect it to give me at least *some* words as > output > > On Wed, Jul 22, 2015 at 1:20 PM, Nagendra Goel <nag...@go... > > wrote: > >> From your description this does not sound like a faulty ivector. Ivector >> might have a small role but you should first look for problems elsewhere. >> Maybe the recording itself goes bad? >> >> Nagendra Kumar Goel >> On Jul 22, 2015 6:00 AM, "Amit Beka" <ami...@gm...> wrote: >> >>> Hi, >>> >>> I've been using online_nnet2_decoder for quite some time now for ASR in >>> a dialogue system, where some users are returning users. Naturally, we use >>> online i-vector extraction to better recognize each user's speech. >>> >>> Unfortunately, we have found some cases where the extracted i-vector >>> decreases the performance of the decoder, usually by identifying 0 or 1 >>> word (something like 'a', 'i' or 'yea') instead of recognizing the whole >>> utterance. Usually, the degraded performance lasts for 5-6 utternaces (each >>> is 1-3 seconds) until a good i-vector is "recovered". >>> >>> I would be grateful if anyone on the list may help with some of the >>> following questions: >>> >>> 1. Is it a bug, or i-vectors may behave this way (for no apparent >>> reason, when listening to the audio)? >>> >>> 2. Can I have a reliable way of telling when the i-vector is >>> problematic? (except checking the lengths of the utterance and the >>> transcription). What can be a good update method to the adaptation state >>> (based on confidence, length of utternance)? >>> >>> 3. Is it possible to separate the i-vector to some features which are >>> user-specific (like tone) and some that are environment specific (like >>> noise)? If so, I would probably want to "forget" the environment-specific >>> features and keep only the user-specific features when the utterances are >>> not consecutive >>> >>> I was wondering if there is a way to "understand" the changes in the >>> adaptation state, for a non-expert in signal-processing like me :) >>> >>> Thanks, >>> Beka >>> >>> >>> ------------------------------------------------------------------------------ >>> Don't Limit Your Business. Reach for the Cloud. >>> GigeNET's Cloud Solutions provide you with the tools and support that >>> you need to offload your IT needs and focus on growing your business. >>> Configured For All Businesses. Start Your Cloud Today. >>> https://www.gigenetcloud.com/ >>> _______________________________________________ >>> Kaldi-users mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> >>> > |