So, extracting phonemes using pocketsphinx is not possible that means ? Is
there any other tool that could help me with this. I am building a speech
verification app for Android device.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Make a dictionary with as many words as phonemes, where each phoneme is a
word, and a simple FSG where any word follows any other word.
It will perform better, if you write something that makes longer words out of
"sane" combinations of 2 or 3 phonemes, that will remove a lot of interword
penalties and remove nonsensical combinations like /T/T/T/ from the search.
You have to do this with an FSG, the JSGF bugs will kill you if you try that,
and the
Don't be surprised if the results are highly inaccurate, phoneme recognition
is not a well developed science. The words in the dictionary provide accuracy-
enhancing context for the phonemes. The grammar or language model provides
context for the words. Strip that context, and the result is surprisingly
poor. Same thing as if you extract individual phonemes from a wave file and
try to recognize them yourself: humans can't do this easily, either.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So, I already have a pocketsphinx program which translates speech to text. I
have a program which translates sppech to text. Once I have the files should I
just change the dictionary and add the FSG file in the code or do I need to
alter the code also ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hey,
How can I recognize phonemes using pocketsphinx and display them on screen ?
Pocketsphinx does not support this feature.
So, extracting phonemes using pocketsphinx is not possible that means ? Is
there any other tool that could help me with this. I am building a speech
verification app for Android device.
Make a dictionary with as many words as phonemes, where each phoneme is a
word, and a simple FSG where any word follows any other word.
It will perform better, if you write something that makes longer words out of
"sane" combinations of 2 or 3 phonemes, that will remove a lot of interword
penalties and remove nonsensical combinations like /T/T/T/ from the search.
You have to do this with an FSG, the JSGF bugs will kill you if you try that,
and the
Don't be surprised if the results are highly inaccurate, phoneme recognition
is not a well developed science. The words in the dictionary provide accuracy-
enhancing context for the phonemes. The grammar or language model provides
context for the words. Strip that context, and the result is surprisingly
poor. Same thing as if you extract individual phonemes from a wave file and
try to recognize them yourself: humans can't do this easily, either.
So, I already have a pocketsphinx program which translates speech to text. I
have a program which translates sppech to text. Once I have the files should I
just change the dictionary and add the FSG file in the code or do I need to
alter the code also ?