Hello,
I was wondering if it is possible to adapt pocketsphinx to be a speaker dependent voice recognition. I should recognize words or phrases spoken before. Because I'm not familiar with it can someone please tell me if it is possible?
Thank you
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is no such functionality in pocketsphinx API
What you can do is to use sphinxbase library to extract MFC coefficients
first, see sphinx_fe source for example on how to do that.
Then you can apply dynamic time warping algorithm to compare the
original recording to the new one. DTW implementation is very simple,
it's just 50 lines of code:
The DTW method is remarkably effective in finding matches.
A sample implementation is at http://pejaver.com/Temp/dtw.c
Convert a few phrases using sphinx_fe into .mfc files and feed them to above pgm!!
Works well for the same speaker.
Can you suggest a way to further improve the accuracy?
I tried matching velocity and acceleration values. (delta, double delta, 39 columns)
It took longer and the computer got quite warm, but the accuracy did not improve.
Last edit: Rajaram Pejaver 2012-12-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ray, Can you explain further what you are trying to do? I think I am doing something similar and am using a different approach. It is more complicated.
I am breaking up the phrase to phonemes (using a phoneme dictionary), adding the new phrase to a dictionary, and adjusting the LM. I am currently stuck in the last step.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am breaking up the phrase to phonemes (using a phoneme dictionary), adding the new phrase to a dictionary, and adjusting the LM. I am currently stuck in the last step.
Rajaram, you are welcome to provide more details on your trouble in order to get help.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I was wondering if it is possible to adapt pocketsphinx to be a speaker dependent voice recognition. I should recognize words or phrases spoken before. Because I'm not familiar with it can someone please tell me if it is possible?
Thank you
Hello
There is no such functionality in pocketsphinx API
What you can do is to use sphinxbase library to extract MFC coefficients
first, see sphinx_fe source for example on how to do that.
Then you can apply dynamic time warping algorithm to compare the
original recording to the new one. DTW implementation is very simple,
it's just 50 lines of code:
http://en.wikipedia.org/wiki/Dynamic_time_warping
There are few libraries which implement DTW as well, you can find the
links on the wikipedia page.
It would be great to see a pocketsphinx patch demonstrating DTW
implementation.
See also
https://sourceforge.net/p/cmusphinx/discussion/sphinx4/thread/c6f3f2f3/
The DTW method is remarkably effective in finding matches.
A sample implementation is at http://pejaver.com/Temp/dtw.c
Convert a few phrases using sphinx_fe into .mfc files and feed them to above pgm!!
Works well for the same speaker.
Can you suggest a way to further improve the accuracy?
I tried matching velocity and acceleration values. (delta, double delta, 39 columns)
It took longer and the computer got quite warm, but the accuracy did not improve.
Last edit: Rajaram Pejaver 2012-12-18
Thank you for the quick reply, and for the informative thread. I'll look them up.
Ray, Can you explain further what you are trying to do? I think I am doing something similar and am using a different approach. It is more complicated.
I am breaking up the phrase to phonemes (using a phoneme dictionary), adding the new phrase to a dictionary, and adjusting the LM. I am currently stuck in the last step.
Rajaram, you are welcome to provide more details on your trouble in order to get help.