I am using pocketspinx on android, to make an assistive application for the
deaf. The app should enable deaf people to be phoned by hearing people and
sphinx is used to recognise the words of the caller to be displayed so that
the deaf user can read them. I have made up an IM application using the
pocketsphinx demo, the app works, but I am getting low accuracy, is there any
way I can improve that. I have left the configuration as it was in the demo
and only changed the UI and the way recognition is triggered.
Because I cannot get audio directly from the phone call, due to android
security protocol, I resorted to using loudspeaker, I am testing the app with
8KhZ, 60 db WAV tracks from the open speech repository, I am playing them and
measuring wer with a method inside the application.
Can you help me?
Gennaro.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
forget the whole deaf idea, I just need some pointers to get accuracy as
high as possible, with tracks playing over a loudspeaker.
In case of a loudspeaker the audio quality is degraded significantly. One need
to cleanup the data before reasonable recognition will be possible. This
includes dereverberation and environment-specific acoustic model training.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello Guys,
I am using pocketspinx on android, to make an assistive application for the
deaf. The app should enable deaf people to be phoned by hearing people and
sphinx is used to recognise the words of the caller to be displayed so that
the deaf user can read them. I have made up an IM application using the
pocketsphinx demo, the app works, but I am getting low accuracy, is there any
way I can improve that. I have left the configuration as it was in the demo
and only changed the UI and the way recognition is triggered.
Because I cannot get audio directly from the phone call, due to android
security protocol, I resorted to using loudspeaker, I am testing the app with
8KhZ, 60 db WAV tracks from the open speech repository, I am playing them and
measuring wer with a method inside the application.
Can you help me?
Gennaro.
With the current state of technology this idea is not going to work. Don't
waste your time on it.
forget the whole deaf idea, I just need some pointers to get accuracy as high
as possible, with tracks playing over a loudspeaker.
What can I do?
In case of a loudspeaker the audio quality is degraded significantly. One need
to cleanup the data before reasonable recognition will be possible. This
includes dereverberation and environment-specific acoustic model training.