Menu

Bad result in DNN nnet2 online decoding

Help
2015-07-14
2015-07-15
  • Do Quoc Truong

    Do Quoc Truong - 2015-07-14

    Hi all,

    I have success training the DNN nnet2 online model with MFCC feature.
    So I do the same thing again with fbank feature.
    However, I got an unexpected WER, almost the sentence is recognized incorrectly.

    I have checked the compute_prob_valid.*.log, it looks fine. With MFCC the final
    value is 0.6065 and with fbank it is 0.573.

    The command for the decoding is:
    online2-wav-nnet2-latgen-faster --online=true --do-endpointing=false --config=online_nnet2_decoding.conf --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=0.1 --word-symbol-table=words.txt final.mdl HCLG.fst ...

    I also checked the online_nnet2_decoding.conf. It was generated correctly for the fbank:
    --feature-type=fbank
    --fbank-config=...fbank.conf
    --ivector-extraction-config=...ivector_extractor.conf
    --endpoint.silence-phones=...

    I would appreciate if you could give me some hints to find out the problem!

    Thank you,
    Yours sincerely,
    Truong Do

     
    • Daniel Povey

      Daniel Povey - 2015-07-14

      I don't think I have ever run the setup with the fbank features--
      there is really no point, because we use MFCC without dimension
      reduction, which are just a linearly transformed version of the fbank
      features. It is possible that there is some bug somewhere. Decode
      with a higher verbose level and look for the objecctive-function
      changes reported for the iVectors. (You re-trained the iVector
      extractor on top of fbank features, right?). That would narrow down
      whether something is going wrong with the iVectors.

      Dan

      I have success training the DNN nnet2 online model with MFCC feature.
      So I do the same thing again with fbank feature.
      However, I got an unexpected WER, almost the sentence is recognized
      incorrectly.

      I have checked the compute_prob_valid.*.log, it looks fine. With MFCC the
      final
      value is 0.6065 and with fbank it is 0.573.

      The command for the decoding is:
      online2-wav-nnet2-latgen-faster --online=true --do-endpointing=false
      --config=online_nnet2_decoding.conf --max-active=7000 --beam=15.0
      --lattice-beam=6.0 --acoustic-scale=0.1 --word-symbol-table=words.txt
      final.mdl HCLG.fst ...

      I also checked the online_nnet2_decoding.conf. It was generated correctly
      for the fbank:
      --feature-type=fbank
      --fbank-config=...fbank.conf
      --ivector-extraction-config=...ivector_extractor.conf
      --endpoint.silence-phones=...

      I would appreciate if you could give me some hints to find out the problem!

      Thank you,
      Yours sincerely,
      Truong Do


      Bad result in DNN nnet2 online decoding


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Do Quoc Truong

    Do Quoc Truong - 2015-07-15

    Thanks you for your help.

    I don't think I have ever run the setup with the fbank features there is really no point, because we use MFCC without dimension reduction, which are just a linearly transformed version of the fbank features

    If so, why the result from fbank and mfcc features are slightly difference.
    And when I combined the result from those 2 systems, I got some improvement (based on my experiment before).

    Decode with a higher verbose level

    The objective function improvement from estimating the iVector looks correct,
    it is increase when we see more frames.

    Do you think the problem is in graph HCLG.fst?

    VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 65.8562
    VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 65.8981
    VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 66.0219
    VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 66.2964
    VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 66.6309
    VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 66.8939
    VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 67.1176
    VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 67.5152
    VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 68.459
    VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 69.3028

     
    • Daniel Povey

      Daniel Povey - 2015-07-15

      Those objective function improvements are too large- they should be
      around 10. It could indicate a mismatch in the iVector extractor
      (e.g. trained on wrong data? mismatch in cmvn?)
      What were the objf improvements like in training? The averages should
      have been printed in the log.
      Dan

      On Tue, Jul 14, 2015 at 8:21 PM, Do Quoc Truong truongdq54@users.sf.net wrote:

      Thanks you for your help.

      I don't think I have ever run the setup with the fbank features there is
      really no point, because we use MFCC without dimension reduction, which are
      just a linearly transformed version of the fbank features

      If so, why the result from fbank and mfcc features are slightly difference.
      And when I combined the result from those 2 systems, I got some improvement
      (based on my experiment before).

      Decode with a higher verbose level

      The objective function improvement from estimating the iVector looks
      correct,
      it is increase when we see more frames.

      Do you think the problem is in graph HCLG.fst?

      VLOG[4]
      (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
      Objective function improvement from estimating the iVector (vs. default
      value) is 65.8562
      VLOG[4]
      (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
      Objective function improvement from estimating the iVector (vs. default
      value) is 65.8981
      VLOG[4]
      (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
      Objective function improvement from estimating the iVector (vs. default
      value) is 66.0219
      VLOG[4]
      (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
      Objective function improvement from estimating the iVector (vs. default
      value) is 66.2964
      VLOG[4]
      (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
      Objective function improvement from estimating the iVector (vs. default
      value) is 66.6309
      VLOG[4]
      (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
      Objective function improvement from estimating the iVector (vs. default
      value) is 66.8939
      VLOG[4]
      (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
      Objective function improvement from estimating the iVector (vs. default
      value) is 67.1176
      VLOG[4]
      (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
      Objective function improvement from estimating the iVector (vs. default
      value) is 67.5152
      VLOG[4]
      (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
      Objective function improvement from estimating the iVector (vs. default
      value) is 68.459
      VLOG[4]
      (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
      Objective function improvement from estimating the iVector (vs. default
      value) is 69.3028


      Bad result in DNN nnet2 online decoding


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Do Quoc Truong

    Do Quoc Truong - 2015-07-15

    Hi Dan,

    I found the mistake, the problem is I used the wrong ivector extractor.

    Thank you so much for your advice.