Menu

normalization problem

Help
gary
2015-06-25
2015-06-26
  • gary

    gary - 2015-06-25

    hi all

    When I use kaldi to do gmm online decoding, I found
    I use asus notebook to record, the accuracy is low
    I use lenovo notebook to record, the accuracy is high
    We can see the waveform and spectrum here
    http://i.imgur.com/Cviw03T.png

    I found the high frequency part of asus nb spectrum is depressed

    My question are
    1.How to handle different devices with different background noise and spectrum depressed problem?
    What is the solution? Is the solution we should collect speech data of different devices to train acoustic model ?
    2.Does the amplitude high or low impact the accuracy? Should I do amplitude or frequency normalization during getting feature extraction?
    or CMVN already done there normalization ???

     

    Last edit: gary 2015-06-25
    • Daniel Povey

      Daniel Povey - 2015-06-25

      When I use kaldi to do gmm online decoding, I found
      I use asus notebook to record, the accuracy is low
      I use lenovo notebook to record, the accuracy is high
      We can see the waveform and spectrum here
      http://i.imgur.com/Cviw03T.png

      I found the high frequency part of asus nb spectrum is depressed

      My question are
      1.How to handle different devices with different background noise and
      spectrum depressed problem?
      What is the solution? Is the solution we should collect speech data of
      different devices to train acoustic model ?

      Yes, it's probably necessary to either add data from different
      devices, or simulate it somehow.

      2.Is the amplitude high or low impact the accuracy , should I do
      normalization before getting feature extraction?

      The amplitude affects the accuracy for online-nnet2 models (and
      online-gmm models) but not for other models. We are trying to do
      volume-perturbation in online-nnet2 training to make the trained
      models more robust to varying accuracy in future, but the models on
      kaldi-asr.org mostly don't have this yet.
      Online normalization is hard, and it might not always interact well
      with the decoding if you do it online because the factor you multiply
      by will change as you go, but you might want to make sure your signals
      are at least roughly in the right range.

      Dan


      normalization problem


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • gary

    gary - 2015-06-26

    hi Dan
    Thanks for your quick reply.

    Do you think the below paper can solve the problem of spectrum depressed problem?
    improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM
    http://research.microsoft.com/apps/pubs/default.aspx?id=179159

    Thank you.

     
    • Daniel Povey

      Daniel Povey - 2015-06-26

      That paper seems to be addressing a different problem- namely, how to
      make use of narrowband speech while training a wideband system - and
      they did not really try to do any data simulation. The kind of thing
      I had in mind is to put the data through a linear filter and then add
      some kind of noise.
      Dan

      On Fri, Jun 26, 2015 at 3:35 AM, gary gary2015@users.sf.net wrote:

      hi Dan
      Thanks for your quick reply.

      Do you think the below paper can solve the problem of spectrum depressed
      problem?
      improving wideband speech recognition using mixed-bandwidth training data in
      CD-DNN-HMM
      http://research.microsoft.com/apps/pubs/default.aspx?id=179159

      Thank you.


      normalization problem


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355348/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/