Menu

Different volume levels are leading to highly differently recognition rates

Help
2017-06-28
2017-06-28
  • Frank Sippel

    Frank Sippel - 2017-06-28

    Hi!

    I am currently evaluating the performance of Pocketsphinx using Matlab and the Python wrapper for Pocketsphinx (https://github.com/bambocher/pocketsphinx-python). I'm currently stuck in a weird situation, because I first normalize to the maximum absolute value and then applying different scale values. In Matlab Code :

    result=result ./ max(abs(result(:)));
    result=result * normalizingfactor;
    

    "result" is the audio signal.
    Then I send it to Pocketsphinx and afterwards it gets evaluated by the nist scoring toolkit.
    So here are the results :
    Correct (x-Axis -> normalizing factor, y-Axis -> percentages):

    Total Error (x-Axis -> normalizing factor, y-Axis -> percentages):

    Do you have any suggestions how to overcome this problem and get a flat curve (disregarding quantization issues)?
    Or at least finding the best value?

    Thanks in advance!
    ~Frank

     
    • Nickolay V. Shmyrev

      Recognizer adapts to any audio level but does that only after the first utterance so the test you created is not very meaningful I suppose, you just do not leave the recognizer the time to adapt.
      True decoding accuracy does not depend on volume level.

      You can read this forum about cepstral mean normalization.

       
      • Frank Sippel

        Frank Sippel - 2017-06-28

        Thank you very much! By putting all test sound files together into one large sound file, I'm getting the same results except (like you already said) for the first utterance :-)

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.