Re: [Kaldi-users] Faulty i-vectors in a dialogue system

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I have listened to the recordings themselves (after VAD), and they all
sound good, and were recorded with the same background noise (almost none)
with the same speaker and in the same volume.

I use the nnet2-online-latgen-faster decoder, and although my LM doesn't
really suits the input, I expect it to give me at least *some* words as
output

On Wed, Jul 22, 2015 at 1:20 PM, Nagendra Goel <nag...@go...>
wrote:

> From your description this does not sound like a faulty ivector. Ivector
> might have a small role but you should first look for problems elsewhere.
> Maybe the recording itself goes bad?
>
> Nagendra Kumar Goel
> On Jul 22, 2015 6:00 AM, "Amit Beka" <ami...@gm...> wrote:
>
>> Hi,
>>
>> I've been using online_nnet2_decoder for quite some time now for ASR in a
>> dialogue system, where some users are returning users. Naturally, we use
>> online i-vector extraction to better recognize each user's speech.
>>
>> Unfortunately, we have found some cases where the extracted i-vector
>> decreases the performance of the decoder, usually by identifying 0 or 1
>> word (something like 'a', 'i' or 'yea') instead of recognizing the whole
>> utterance. Usually, the degraded performance lasts for 5-6 utternaces (each
>> is 1-3 seconds) until a good i-vector is "recovered".
>>
>> I would be grateful if anyone on the list may help with some of the
>> following questions:
>>
>> 1. Is it a bug, or i-vectors may behave this way (for no apparent reason,
>> when listening to the audio)?
>>
>> 2. Can I have a reliable way of telling when the i-vector is problematic?
>> (except checking the lengths of the utterance and the transcription). What
>> can be a good update method to the adaptation state (based on confidence,
>> length of utternance)?
>>
>> 3. Is it possible to separate the i-vector to some features which are
>> user-specific (like tone) and some that are environment specific (like
>> noise)? If so, I would probably want to "forget" the environment-specific
>> features and keep only the user-specific features when the utterances are
>> not consecutive
>>
>> I was wondering if there is a way to "understand" the changes in the
>> adaptation state, for a non-expert in signal-processing like me :)
>>
>> Thanks,
>> Beka
>>
>>
>> ------------------------------------------------------------------------------
>> Don't Limit Your Business. Reach for the Cloud.
>> GigeNET's Cloud Solutions provide you with the tools and support that
>> you need to offload your IT needs and focus on growing your business.
>> Configured For All Businesses. Start Your Cloud Today.
>> https://www.gigenetcloud.com/
>> _______________________________________________
>> Kaldi-users mailing list
>> Kal...@li...
>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>
>>