Menu

How to optimize CMN to get a good live recognition from start

Help
mmende
2015-07-11
2015-07-15
  • mmende

    mmende - 2015-07-11

    Hi,

    I am currently playing arround with the pocketsphinx api with live input data. The results are getting really good after a while but it always takes a long warmup phase to optimize the CMN values I guess. To shorten this warmup phase I tried to adjust the cmninit value but since in my understanding cmn tries to concern device specific and environmental peculiarity this will only affect my own device (but I could not improve something even here). I also tried to redecode the first frame of speech to let pocketsphinx decode the frame once again after tuning it's cmn (like proposed here) but I guess one frame is just not enough. And furthermore I'd like to understand the meaning of the values, so what would e.g. 44,3,-6 stand for? Have you any suggestions concerning this problem? Thanks in advance...

     
    • Nickolay V. Shmyrev

      To understand speech recognition theory it is helpful to read the textbook:

      http://www.amazon.com/Spoken-Language-Processing-Algorithm-Development/dp/0130226165

      there is a whole chapter about feature extraction.

      Our plan for robust CMN is to implement a buffer to store first 2 seconds of audio before decoding (not a single frame) to estimate initial values. I wrote about this in original node-pocketsphinx discussion.

      This feature would require quite a big rework of the pocketsphinx framework though so it's delayed. It is planned for next release still.

       
      • mmende

        mmende - 2015-07-15

        Ok thanks Nickolay. That's what I expected. I guess I just wait for the next release then.

         

Log in to post a comment.