Menu

Noise robustness

2014-05-02
2014-06-30
1 2 > >> (Page 1 of 2)
  • Dmytro Prylipko

    Dmytro Prylipko - 2014-05-02

    Dear Nickolay,

    Could you tell what is the state of the art with the noise robustness right now?
    Does it still make sense to include the Wiener filter into a pipeline?

    Also, import of the PNCC features has been announced almost a year ago:
    http://nshmyrev.blogspot.de/2013/06/around-noise-robust-pncc-features.html

    But I found the only one mention of that in the Denoise class.
    I am not sure, whether it is employed by default, or I must include it into the pipeline explicitely. Are you going to implement the real PNCC feature extraction, or MFCC+Denoise do this?

    Thanks in advance.

     
  • Nickolay V. Shmyrev

    Could you tell what is the state of the art with the noise robustness right now?

    This is a complex subject. State of the art is that you need to know the noise profile in order to cancel it effectively. Simple algorithms based on spectral subtraction are also popular and allow some improvements.

    Does it still make sense to include the Wiener filter into a pipeline?

    No

    I am not sure, whether it is employed by default, or I must include it into the pipeline explicitely. Are you going to implement the real PNCC feature extraction, or MFCC+Denoise do this?

    Denoise is automatically enabled in AutoCepstrum if feat.params have -remove_noise yes or if you add it to processing pipeline.

    Are you going to implement the real PNCC feature extraction, or MFCC+Denoise do this?

    We are using denoised MFCC by default in CMUSphinx, we are not going to implement PNCC.

     

    Related

    Forums: Using xml configuration files or predefined defaults

  • Dmytro Prylipko

    Dmytro Prylipko - 2014-05-05

    Could you give a hint on the explicit usage of Denoise in the Sphinx4 pipeline?
    What I understood from here:

    http://cmusphinx.sourceforge.net/doc/sphinx4/edu/cmu/sphinx/frontend/AutoCepstrum.html

    it looks like:

    StreamDataSource
    Preemphasizer
    RaisedCosineWindower
    DiscreteFourierTransform
    MelFrequencyFilterBank
    Denoise
    DiscreteCosineTransform2
    Lifter
    BatchCMN
    DeltasFeatureExtractor
    FeatureTransform

    Is it correct?

    Can I use the voxforge2 acoustic models 'as is' with the new pipeline, or they must be retrained using -remove_noise yes ?

    I adapted the models using MAP, does the noise removal affect the adaptation process?

    I would be very pleased if you provided some references concerning noise profiles and their usage in noise cancellation, I have audio data recorded, grouped by users, so I believe I can use it for the noise cancellation.

     
  • Nickolay V. Shmyrev

    Is it correct?

    Yes

    Can I use the voxforge2 acoustic models 'as is' with the new pipeline, or they must be retrained using -remove_noise yes ?

    You can use existing models but it's better to retrain. The WER difference on clean data must be minor.

    I adapted the models using MAP, does the noise removal affect the adaptation process?

    No

    I would be very pleased if you provided some references concerning noise profiles and their usage in noise cancellation, I have audio data recorded, grouped by users, so I believe I can use it for the noise cancellation.

    There is no such thing as noise profile in current implementation though it seems reasonable thing to have.

     
  • Dmytro Prylipko

    Dmytro Prylipko - 2014-05-23

    Well, I tried to retrain the models and to use the above described pipeline.
    And got an accuracy improvement 4% absolute (74.65 -> 78.72).
    The only strange thing: I found that the recognizer does not work when using DiscreteCosineTransform2, as it is suggested here:

    http://cmusphinx.sourceforge.net/doc/sphinx4/edu/cmu/sphinx/frontend/AutoCepstrum.html

    I got just empty output instead of hypotheses. However, it works with DiscreteCosineTransform.

     
  • Nickolay V. Shmyrev

    And got an accuracy improvement 4% absolute (74.65 -> 78.72).

    Ok, great

    I got just empty output instead of hypotheses. However, it works with DiscreteCosineTransform.

    You probably misconfigured something in training. New trainer has updated properties, so you need to update etc/feat.params and sphinx_train.cfg. After that in sphinx_train.cfg you should see CFG_TRANSFORM configuration variable which must be set to dct and in model feat.params you should see -transform dct too

     
  • Dmytro Prylipko

    Dmytro Prylipko - 2014-05-26

    Yeah, -transform in feat.params is set to legacy...

    But, I don't have the CFG_TRANSFORM in my current config. Should I just add it?
    Maybe you have a sample of sphinx_train.cfg for the VoxForge corpus to share with me?

     
    • Nickolay V. Shmyrev

      Just add in your config:

      ~~~~~~~~~~~~
      $CFG_WAVFILE_SRATE = 16000.0;
      $CFG_NUM_FILT = 25; # For wideband speech it's 25, for telephone 8khz reasonable value is 15
      $CFG_LO_FILT = 130; # For telephone 8kHz speech value is 200
      $CFG_HI_FILT = 6800; # For telephone 8kHz speech value is 3500
      $CFG_TRANSFORM = "dct"; # Previously legacy transform is used, but dct is more accurate
      $CFG_LIFTER = "22"; # Cepstrum lifter is smoothing to improve recognition
      $CFG_VECTOR_LENGTH = 13; # 13 is usually enough
      ~~~~~~~~~~~~~~

      and it should be fine. See sphinxtrain/etc/sphinx_train.cfg as a template.\

       
  • Nayak BS

    Nayak BS - 2014-06-12

    hi Dmytro,

    Can you share both of your updated files feat.params and sphinx_train.cfg, i would also like to see how it works and optimize the performance in the presence of noise.

    Thanks

     
  • Dmytro Prylipko

    Dmytro Prylipko - 2014-06-16

    Here they are. Plus the feature extraction script.
    I did not tested them yet, but it must be the correct configuration.
    Please note that the latest sphinxbase and sphinxtrain must be installed.

     
  • Dmytro Prylipko

    Dmytro Prylipko - 2014-06-18

    So, having conducted experiments I found that denoising helps much for my task: 74.65% -> 81.38%.
    But when I adapted the voxforge models using MAP in LOSO way, I got 72.93% accuracy... o_0
    Previously I had a substantial improvement at that stage...

    At the same time, adaptation on test data without LOSO (whole bunch of test records used to adapt and to evaluate) provided me with 97.50% of accuracy, so the data itself is fine.

    Just can't figure out what it might be... Instability with respect to unseen data after adaptation?

     
  • Nickolay V. Shmyrev

    MAP in LOSO way

    I'm not sure what do you mean by "LOSO way"

    At the same time, adaptation on test data without LOSO (whole bunch of test records used to adapt and to evaluate) provided me with 97.50% of accuracy, so the data itself is fine.

    That feels like an issue for me, you definitely should not get an improvement from 81% to 97%.

    But when I adapted the voxforge models using MAP in LOSO way, I got 72.93% accuracy... o_0

    There could be many issues here from different language weight for adapted model to wider beams or issues with feature extraction. It's hard to say what is going on here.

    Please note that MAP adaptation of continuous model requires quite a lot of data. At least few hours.

     

    Last edit: Nickolay V. Shmyrev 2014-06-19
  • Dmytro Prylipko

    Dmytro Prylipko - 2014-06-19

    LOSO = leave on speaker out.

    I use MAP to adapt models to the channel and accent (heavy Russian), rather than for speaker adaptation. So, to predict the performance on new users, I split test data into speaker-dependent folds and perform cross-validation using those folds: 1 fold for testing, others for adaptation.

    Such a configuration is referred as MAP_LOSO.

    In contrast to that, in MAP_full I use all the available data to adapt and to test models.

    Previously, I had something like:
    No adapt - 74.65%
    MAP_LOSO - 81.06%
    MAP_full - 94.21%

    Having introduced denoising (retrained models, new pipeline in config, new feature extraction), I got:

    No adapt - 81.38%
    MAP_LOSO - 72.93%
    MAP_full - 97.50%

    The config files are equal, except the ModelLoader location value and the frontend pipeline. I do not have much data (~1h), but so far it helped...

    Absolutely mystical :)

     

    Last edit: Dmytro Prylipko 2014-06-19
  • Nickolay V. Shmyrev

    The config files are equal

    For a new frontend you usually need to reevaluate all other parameters (language weight, beams)

    Absolutely mystical :)

    Usually there is a reason, however, it's hard to give you that just looking on the numbers. You can try MLLR adaptation instead of MAP, you can also play with tau parameter of map_adapt to control interpolation between adaptation data and original models.

     
  • Dmytro Prylipko

    Dmytro Prylipko - 2014-06-19

    For a new frontend you usually need to reevaluate all other parameters (language weight, beams)

    You're right, however I wanted to see first the effect of the new models and the pipeline only. Further fine-tuning to be performed yet.

    Your suggestion to play with tau is a good idea. I can remember I could not find where to specify the parameter :)

     
  • Dmytro Prylipko

    Dmytro Prylipko - 2014-06-19

    I tried to use fixed tau instead of bayes mean, and it helped!
    Now the accuracy for MAP_LOSO is around 84% for different tau.

     
  • Dmytro Prylipko

    Dmytro Prylipko - 2014-06-23

    So, I have trained and tested the Voxforge acoustic model with denoising. The next obvious thing is to share it. How I can do this? Can I upload it to SourceForge (https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/English%20Voxforge/)?

     
    • Nickolay V. Shmyrev

      Dear Dmytro

      It's great you created a new version of voxforge-en, please upload it to dropbox and give here a link and I'll publish it in our downloads. I suppose that the model is updated with the latest data and also has same file structure (etc folder included).

       
  • Dmytro Prylipko

    Dmytro Prylipko - 2014-06-25

    So, what is your judgement?

     
    • Nickolay V. Shmyrev

      Sorry I didn't have time to check accuracy. I will check soon. Thank you.

       

      Last edit: Nickolay V. Shmyrev 2014-06-27
  • Dmytro Prylipko

    Dmytro Prylipko - 2014-06-26

    Welcome to discuss and to contribute:
    http://habrahabr.ru/post/227099/

     
    • Nickolay V. Shmyrev

      Hello

      Thank you for the nice post on the popular resource, it covers few interesting parts.

      I'm looking on your model and see you trained with -nfilt 40. This is not the optimal value, nfilt must be around 25. I would also train more senones since Voxforge data is big these days.

       
      • Nickolay V. Shmyrev

        There is also an issue that accuracy of your model is less than the accuracy of the voxforge-en-0.4. This is actually an issue I encountered and the reason I stopped to update voxforge models. Somehow the quality of voxforge data reduced with the time so the accuracy of the model drops if you include the new data. This is a subject for research though.

         
        • Pranav Jawale

          Pranav Jawale - 2014-06-30

          @Dmytro

          For state-of-the-art have a look at following reference

          Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). An Overview of
          Noise-Robust Automatic Speech Recognition. IEEE/ACM Transactions on Audio,
          Speech & Language Processing
          , 22(4), 745-777.

          It has loads of pointers, but unfortunately not satisfactory numerical
          comparison amongst the methods.

           

          Last edit: Nickolay V. Shmyrev 2014-06-30
1 2 > >> (Page 1 of 2)

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.