On 22/01/2012, at 4:23 AM, gumstix-users-request@... wrote:
> I am very interested in voice recognition and TTS synthesis.
> The directions of this post were likely once helpful but the information is
> completely inaccurate now.
> I was able to get most of the way through this example by changing:
> The pocketsphinx package is no longer present.
> However, I downloaded the source for sphinx-base and pocketsphinx using wget
> I installed task-native-sdk, python and bison and was then able to follow
> the directions on the sphinx website to compile and install the two sphinx
> packages from source.
> I am using a Tobi board so the audio input is not microphone but is a
> digital in line. So instead of testing with a microphone I have the digital
> input from the tobi connected to the speaker out of my PC. I know that this
> method works and that the levels are fine because I can use arecord and
> aplay to record audio sent from my PC and then play the file back through
> the Tobi speaker out. The audio is clear enough and loud enough.
> I can run the example command line application found in
> <sphinxroot>/bin/pocketsphinx_continuous, and the app says "Ready..." When I
> play a pre-recorded sound file of someone talking on my PC the application
> running on the Overo detects that audio is playing, it says it is
> "listening" and then stop after the chip stops.... pocketsphinx analyzes the
> words and spits out 100% random text which has never once been correct.
> There is zero word recognition which usually indicates that there is a major
> problem such as no audio or the signal is too large/small. I am at a loss,
> since I can prove that the signals are clear and the volume is right. The
> fact that it knows when I am talking means it is listening to the correct
> device and I have even seen that if I play a long recorded voice message the
> text, while 100% wrong, is equally as long as my text file.
> Next, one might think it is an error or issue with the grammar / dictionary.
> But, I have tried the built in english language, I have downloaded the
> largest known good english model and I have even followed the directions for
> making my own corpus and language files. Each produces the exact same
> Perhaps Ash_Charles used magic pixy dust to get this working months ago and
> wants to share some with me?
I tried dragon-naturally-speaking (on a suitably grunt windows PC) and after 1/2 hour training session I said (something like, I dont recall exactly)
"Hello can you hear me"
it recognised that as (and again not exactly)
"rocket ships in the sea"
If this is the status of a commercial premier app then I think free sw speech recognition is doomed.
If anyone has a better story please share it