What specifically are the items below from your past post.
Is this what will be coming in later releases pocketsphinx or are you hoping they will be coming.
Has anyone outside CMU implemented these features in pocketsphinx and made them freely available like other universities or researchers.
Also, is this the cutting edge or state of the art in front-end signal processing and recognition techniques for SR in the telephony domain?
Thanks and here is the list of your items:
Well, there are issues in both the decoder and the interface with the
telephony application.
First about the decoder, pocketsphinx right now is the most supported
and most feature-reach decoder of the family, but in general it's still
oriented on the embedded devices. For telephony applications you
probably need to extend it a lot. The features that are currently
missing are probably:
Out-of-box support for multiple recognizers (probably more a freeswitch
issue and a model training issue, for example we have no free
male/female model).
Speaker clustering.
Automatic VTLN estimation from pitch (This looks simple).
Good endpointer.
Discriminative training support in SphinxTrain (Huge task).
Good and clean support for a garbage model to be able to filter out
out of grammar words.
Embedded RASTA extraction and RASTA model training.
Advanced features extraction
Another issue is dialog tracking and understanding. CMU folks are doing
work on dialog systems, for example Raven is available
It would be worth to look on it and try to integrate it into
freepbx. Decoder will need to support combined language model. As well
as you'll need a component for postprocessing. The postprocessing includes
disfluency removal, text normalization, text boundary detection. Integration
with nltk probably useful for sense extraction.
If you need more details on any of the above, feel free to ask.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
From past 6 months i am using SPHINX-4 for indian accent english and acheived 60-70% accuracy for read speech not for spontaneous speech(free speech).
Now i am interesting to work with pocket sphinx and contribute my knowledge, can one of you tell me the procedure and required softwares to be installed ??
as of now i have
Microsoft visual studio 2005
Microsoft visual studio 2006
Microsoft window mobile
Active sync
Thank you inadvance
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
First, what are the steps to I need to take or more likely get someone else to do to put these upgrades into pocketsphinx? This stuff is beyond my pay grade but I can give it a try or give an outline to another.
Second, what is the state of the are in ASR within telephony? You mention a bunch of techniques before which all seem to condition and normalize the audio for variance in speaker pitch, noise, volume, etc, before the signal is chopped up and pushed to the recognizer.
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello Good Morning.!!!
I have trained an acoustic model using the documentation and it is succesfully trained and tested.
But while using the model in the command pocketsphinx_continuous and giving the directory path of Model in -hmm and dictionary in -dict the command is running but it is not printing anything.
Please Help
Nickolay, it's probably time for me to ask.
What specifically are the items below from your past post.
Is this what will be coming in later releases pocketsphinx or are you hoping they will be coming.
Has anyone outside CMU implemented these features in pocketsphinx and made them freely available like other universities or researchers.
Also, is this the cutting edge or state of the art in front-end signal processing and recognition techniques for SR in the telephony domain?
Thanks and here is the list of your items:
Well, there are issues in both the decoder and the interface with the
telephony application.
First about the decoder, pocketsphinx right now is the most supported
and most feature-reach decoder of the family, but in general it's still
oriented on the embedded devices. For telephony applications you
probably need to extend it a lot. The features that are currently
missing are probably:
Out-of-box support for multiple recognizers (probably more a freeswitch
issue and a model training issue, for example we have no free
male/female model).
Speaker clustering.
Automatic VTLN estimation from pitch (This looks simple).
Good endpointer.
Discriminative training support in SphinxTrain (Huge task).
Good and clean support for a garbage model to be able to filter out
out of grammar words.
Embedded RASTA extraction and RASTA model training.
Advanced features extraction
Another issue is dialog tracking and understanding. CMU folks are doing
work on dialog systems, for example Raven is available
http://www.ravenclaw-olympus.org/systems_overview.html
It would be worth to look on it and try to integrate it into
freepbx. Decoder will need to support combined language model. As well
as you'll need a component for postprocessing. The postprocessing includes
disfluency removal, text normalization, text boundary detection. Integration
with nltk probably useful for sense extraction.
If you need more details on any of the above, feel free to ask.
Hi Nikolay and Mark
Good Morning, Hope all are doing well
From past 6 months i am using SPHINX-4 for indian accent english and acheived 60-70% accuracy for read speech not for spontaneous speech(free speech).
Now i am interesting to work with pocket sphinx and contribute my knowledge, can one of you tell me the procedure and required softwares to be installed ??
as of now i have
Thank you inadvance
> Is this what will be coming in later releases pocketsphinx or are you hoping they will be coming.
I hope they will be available one day but time frame is large (years or so)
> Has anyone outside CMU implemented these features in pocketsphinx and made them freely available like other universities or researchers.
I don't know anything about other implementations
> Also, is this the cutting edge or state of the art in front-end signal processing and recognition techniques for SR in the telephony domain?
I wouldn't say they are state of art, just a features that would be nice to have.
OK, a couple of things.
First, what are the steps to I need to take or more likely get someone else to do to put these upgrades into pocketsphinx? This stuff is beyond my pay grade but I can give it a try or give an outline to another.
Second, what is the state of the are in ASR within telephony? You mention a bunch of techniques before which all seem to condition and normalize the audio for variance in speaker pitch, noise, volume, etc, before the signal is chopped up and pushed to the recognizer.
Thanks.
Hello Good Morning.!!!
I have trained an acoustic model using the documentation and it is succesfully trained and tested.
But while using the model in the command pocketsphinx_continuous and giving the directory path of Model in -hmm and dictionary in -dict the command is running but it is not printing anything.
Please Help
This is the screenshot of training process.!!
This is my Data set.