Menu

pocketsphinx speaker dependent adaptation

Help
Ray
2012-12-05
2012-12-17
  • Ray

    Ray - 2012-12-05

    Hello,
    I was wondering if it is possible to adapt pocketsphinx to be a speaker dependent voice recognition. I should recognize words or phrases spoken before. Because I'm not familiar with it can someone please tell me if it is possible?
    Thank you

     
  • Nickolay V. Shmyrev

    Hello

    There is no such functionality in pocketsphinx API

    What you can do is to use sphinxbase library to extract MFC coefficients
    first, see sphinx_fe source for example on how to do that.

    Then you can apply dynamic time warping algorithm to compare the
    original recording to the new one. DTW implementation is very simple,
    it's just 50 lines of code:

    http://en.wikipedia.org/wiki/Dynamic_time_warping

    There are few libraries which implement DTW as well, you can find the
    links on the wikipedia page.

    It would be great to see a pocketsphinx patch demonstrating DTW
    implementation.

    See also

    https://sourceforge.net/p/cmusphinx/discussion/sphinx4/thread/c6f3f2f3/

     
    • Rajaram Pejaver

      Rajaram Pejaver - 2012-12-17

      The DTW method is remarkably effective in finding matches.
      A sample implementation is at http://pejaver.com/Temp/dtw.c
      Convert a few phrases using sphinx_fe into .mfc files and feed them to above pgm!!
      Works well for the same speaker.

      Can you suggest a way to further improve the accuracy?
      I tried matching velocity and acceleration values. (delta, double delta, 39 columns)
      It took longer and the computer got quite warm, but the accuracy did not improve.

       

      Last edit: Rajaram Pejaver 2012-12-18
  • Ray

    Ray - 2012-12-05

    Thank you for the quick reply, and for the informative thread. I'll look them up.

     
  • Rajaram Pejaver

    Rajaram Pejaver - 2012-12-13

    Ray, Can you explain further what you are trying to do? I think I am doing something similar and am using a different approach. It is more complicated.

    I am breaking up the phrase to phonemes (using a phoneme dictionary), adding the new phrase to a dictionary, and adjusting the LM. I am currently stuck in the last step.

     
  • Nickolay V. Shmyrev

    I am breaking up the phrase to phonemes (using a phoneme dictionary), adding the new phrase to a dictionary, and adjusting the LM. I am currently stuck in the last step.

    Rajaram, you are welcome to provide more details on your trouble in order to get help.

     

Log in to post a comment.