Menu

recommended audio quality

Help
skatz_teyp
2008-07-01
2012-09-22
  • skatz_teyp

    skatz_teyp - 2008-07-01

    hi everyone,

    it seems like decoding accuracy varies on the quality of audio (e.g. noise, background, volume).. can anyone give a good recommendation on how to maximize decoding accuracy by means of the audio quality? what are the desirable factors of an audio file to get better decoding accuracy? and what factors affect the recognition rate?

    thanks...

     
    • Nickolay V. Shmyrev

      Hm, you need to implement noise cancellation. There are advanced algorithms on noise substraction based on ML as well. and probably use another feature set if you train model yourself.

       
      • skatz_teyp

        skatz_teyp - 2008-07-01

        Ok, I'll try implementing a noise cancellation algorithm for my audio file. Any more audio factors that would affect recognition? More on the speaker side?

         
        • Nickolay V. Shmyrev

          The same microphone that was used for collection of training data? I wonder if it's possible to give any practical advice here. Once your speech is clean enough and follows the required dialect other factors become more important I think.

           
    • skatz_teyp

      skatz_teyp - 2008-07-02

      Oh ok! Thanks Nickolay! I guess the most important factor here is the noise of the audio background. I read some papers that gender also affects recognition accuracy (e.g. male have greater accuracy than feamle dictators). So as their age and some other factors. Is this true for sphinx?

       
      • Nickolay V. Shmyrev

        > I read some papers that gender also affects recognition accuracy (e.g. male have greater accuracy than feamle dictators).

        Probably true, but it's a very minor difference (not more then a percent of WER) once compared to the issue of using proper acoustic model and language model (10% of WER).

         
        • skatz_teyp

          skatz_teyp - 2008-07-03

          From what you stated, can I safely conclude that physiological differences and audio quality would not be very significant comparing to proper acoustic models and language models in terms of recognition accuracy right?

          Additionally, if what I stated above is true, would it mean that anyone, given an appropriate acoustic and language models, would still yield high accuracy? Would it mean that all dictators are "speech recable" (lol, speech recognizable?)

          Thanks for replying.

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.