Kaldi / Discussion / Help: Bad result in DNN nnet2 online decoding

Do Quoc Truong - 2015-07-14

Hi all,

I have success training the DNN nnet2 online model with MFCC feature.
So I do the same thing again with fbank feature.
However, I got an unexpected WER, almost the sentence is recognized incorrectly.

I have checked the compute_prob_valid.*.log, it looks fine. With MFCC the final
value is 0.6065 and with fbank it is 0.573.

The command for the decoding is:
online2-wav-nnet2-latgen-faster --online=true --do-endpointing=false --config=online_nnet2_decoding.conf --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=0.1 --word-symbol-table=words.txt final.mdl HCLG.fst ...

I also checked the online_nnet2_decoding.conf. It was generated correctly for the fbank:
--feature-type=fbank
--fbank-config=...fbank.conf
--ivector-extraction-config=...ivector_extractor.conf
--endpoint.silence-phones=...

I would appreciate if you could give me some hints to find out the problem!

Thank you,
Yours sincerely,
Truong Do

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Povey - 2015-07-14
  
  I don't think I have ever run the setup with the fbank features--
  there is really no point, because we use MFCC without dimension
  reduction, which are just a linearly transformed version of the fbank
  features. It is possible that there is some bug somewhere. Decode
  with a higher verbose level and look for the objecctive-function
  changes reported for the iVectors. (You re-trained the iVector
  extractor on top of fbank features, right?). That would narrow down
  whether something is going wrong with the iVectors.
  
  Dan
  
  I have success training the DNN nnet2 online model with MFCC feature.
  So I do the same thing again with fbank feature.
  However, I got an unexpected WER, almost the sentence is recognized
  incorrectly.
  
  I have checked the compute_prob_valid.*.log, it looks fine. With MFCC the
  final
  value is 0.6065 and with fbank it is 0.573.
  
  The command for the decoding is:
  online2-wav-nnet2-latgen-faster --online=true --do-endpointing=false
  --config=online_nnet2_decoding.conf --max-active=7000 --beam=15.0
  --lattice-beam=6.0 --acoustic-scale=0.1 --word-symbol-table=words.txt
  final.mdl HCLG.fst ...
  
  I also checked the online_nnet2_decoding.conf. It was generated correctly
  for the fbank:
  --feature-type=fbank
  --fbank-config=...fbank.conf
  --ivector-extraction-config=...ivector_extractor.conf
  --endpoint.silence-phones=...
  
  I would appreciate if you could give me some hints to find out the problem!
  
  Thank you,
  Yours sincerely,
  Truong Do
  
  Bad result in DNN nnet2 online decoding
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/kaldi/discussion/1355348/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Do Quoc Truong - 2015-07-15

Thanks you for your help.

I don't think I have ever run the setup with the fbank features there is really no point, because we use MFCC without dimension reduction, which are just a linearly transformed version of the fbank features

If so, why the result from fbank and mfcc features are slightly difference.
And when I combined the result from those 2 systems, I got some improvement (based on my experiment before).

Decode with a higher verbose level

The objective function improvement from estimating the iVector looks correct,
it is increase when we see more frames.

Do you think the problem is in graph HCLG.fst?

VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 65.8562
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 65.8981
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 66.0219
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 66.2964
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 66.6309
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 66.8939
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 67.1176
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 67.5152
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 68.459
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 69.3028

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Povey - 2015-07-15
  
  Those objective function improvements are too large- they should be
  around 10. It could indicate a mismatch in the iVector extractor
  (e.g. trained on wrong data? mismatch in cmvn?)
  What were the objf improvements like in training? The averages should
  have been printed in the log.
  Dan
  
  On Tue, Jul 14, 2015 at 8:21 PM, Do Quoc Truong truongdq54@users.sf.net wrote:
  
  Thanks you for your help.
  
  I don't think I have ever run the setup with the fbank features there is
  really no point, because we use MFCC without dimension reduction, which are
  just a linearly transformed version of the fbank features
  
  If so, why the result from fbank and mfcc features are slightly difference.
  And when I combined the result from those 2 systems, I got some improvement
  (based on my experiment before).
  
  Decode with a higher verbose level
  
  The objective function improvement from estimating the iVector looks
  correct,
  it is increase when we see more frames.
  
  Do you think the problem is in graph HCLG.fst?
  
  VLOG[4]
  (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
  Objective function improvement from estimating the iVector (vs. default
  value) is 65.8562
  VLOG[4]
  (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
  Objective function improvement from estimating the iVector (vs. default
  value) is 65.8981
  VLOG[4]
  (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
  Objective function improvement from estimating the iVector (vs. default
  value) is 66.0219
  VLOG[4]
  (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
  Objective function improvement from estimating the iVector (vs. default
  value) is 66.2964
  VLOG[4]
  (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
  Objective function improvement from estimating the iVector (vs. default
  value) is 66.6309
  VLOG[4]
  (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
  Objective function improvement from estimating the iVector (vs. default
  value) is 66.8939
  VLOG[4]
  (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
  Objective function improvement from estimating the iVector (vs. default
  value) is 67.1176
  VLOG[4]
  (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
  Objective function improvement from estimating the iVector (vs. default
  value) is 67.5152
  VLOG[4]
  (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
  Objective function improvement from estimating the iVector (vs. default
  value) is 68.459
  VLOG[4]
  (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650)
  Objective function improvement from estimating the iVector (vs. default
  value) is 69.3028
  
  Bad result in DNN nnet2 online decoding
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/kaldi/discussion/1355348/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Do Quoc Truong - 2015-07-15

Hi Dan,

I found the mistake, the problem is I used the wrong ivector extractor.

Thank you so much for your advice.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Bad result in DNN nnet2 online decoding

Forums

Help

Bad result in DNN nnet2 online decoding document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Bad result in DNN nnet2 online decoding