Kaldi / Discussion / Help: i-vector in online decoding

peng-lee - 2015-06-15

Hi Dan,
First thank you for your works. I have used DNN based online-decoding setup to get expected result. Now, I know it works but I do not know why it works. Especially the i-vector's effect. So, I have some questions about it, maybe my questions is so easy, but for me is really important. could you help me?
my questions as follows:
1. Why use i-vector in Dnn-based online-decoding setup? What is the main effect of i-vector?
2. When we use online-wav-nnet2-latgen-faster to decode wav file, how extract i-vector online, does every utterance use the same i-vector? if not, does extract i-vector for every 10 frames or others?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Povey - 2015-06-15
  
  would be good if someone else can answer this.
  dan
  
  Last edit: Nickolay V. Shmyrev 2015-06-16
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - peng-lee - 2015-06-15
    
    Thanks for your quickly reply. Can anyone answer my questions or give me some reference.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Jan "yenda" Trmal - 2015-06-15
      
      The DNN models (from the Dan's nnet2) use the i-vectors to provide the
      neural network with the speaker identity. The input features are not
      speaker-normalized -- it's left to the network to figure this out.
      
      During the decoding, the trained i-vector extractor is used to estimate the
      i-vectors. They are extracted based on spk2utt map parameter of
      the online2-wav-gmm-latgen-faster.
      You can create various mappings (for example, you can make each utterance
      uttered by a unique speaker, or just carry out the mapping from the data
      dir)...
      
      The scripts steps/online/decode.sh and egs/rm/s5/local/online/run_nnet2.sh
      (for example) will hopefully answer your questions about how is it done.
      
      y.
      
      Last edit: Nickolay V. Shmyrev 2015-06-16
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Daniel Povey - 2015-06-15
        
        BTW, the iVector is extracted every 10 frames during training, but the
        input to the computation is all frames of the same speaker that are
        prior to the current frame. This is to emulate the online test
        condition.
        
        Last edit: Nickolay V. Shmyrev 2015-06-16
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        peng-lee - 2015-07-13
        
        Thanks a lot! I want to know that does this way suit for dialogue condition. Does it extract an ivector for a speaker or for an utterance When an utterance include two or more speakers. In other words, Whether made speaker detection when extract ivector.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Daniel Povey - 2015-07-13
        
        For dialogue, what you need is speaker diarization, not just speaker
        identification. Vimal and David (cc'd) are working on a speaker
        diarization setup for Kaldi, but it will be a few months, most likely,
        before it's ready.
        Dan
        
        On Sun, Jul 12, 2015 at 7:31 PM, peng-lee peng-lee@users.sf.net wrote:
        
        Thanks a lot! I want to know that does this way suit for dialogue condition.
        Does it extract an ivector for a speaker or for an utterance When an
        utterance include two or more speakers. In other words, Whether made speaker
        detection when extract ivector.
        
        i-vector in online decoding
        
        Sent from sourceforge.net because you indicated interest in
        https://sourceforge.net/p/kaldi/discussion/1355348/
        
        To unsubscribe from further messages, please visit
        https://sourceforge.net/auth/subscriptions/
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - peng-lee - 2015-06-16
        
        Thanks for your reply. Is there a good paper that explains the i-vector effects?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Nickolay V. Shmyrev - 2015-06-16
        
        I find this presentation useful:
        
        http://people.csail.mit.edu/sshum/talks/ivector_tutorial_interspeech_27Aug2011.pdf
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        peng-lee - 2015-06-16
        
        Thanks a lot!
        
        I find this presentation useful:
        http://people.csail.mit.edu/sshum/talks/ivector_tutorial_interspeech_27Aug2011.pdf
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

i-vector in online decoding

Forums

Help

i-vector in online decoding document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

i-vector in online decoding