Sk8St4r - 2008-09-23

Hello! I am an austrian university student currently working on a project to steer VR applications by voice input commands. I am working on a linux box which will host the "recognition daemon" and that will send (socket based) the recognized "single word commands" to the machine running the VR application. Each application will have its own very limited set of single word commands (15-30 commands, e.g: "stop", "move", "left", "right").

So what I need is continuous live speech mode with a certain confidence of word recognition. What I have right now (I modified the sphinx3_livedecode.c in a way to fit my needs) is a recognition daemon that is giving me a partial hypothesis for every additional "single word command" that I speak. So I get the whole hypstring containing all the words I have spoken so far. The trouble is, however, that sometimes it outputs 2 partial hypothesis strings at a time and the second issue I am facing is that if a word has not been recognized correctly it is added to the hypothesis string (incorrectly). So that would give me a wrong steering of my VR application. What I need is a "single word recognition" with no regards to its successing/preceding commands (every command is independent from each other). It has to be accepted with a certain value of confidence or rejected if it is below a certain threshold. What I need actually is a perlbox-voice like frontend sending commands to another box via sockets.

Anyone ideas how i could solve my problems?

p.s: i know that i am working with partial hypothesis ... i tried to end_utterances everytime before i send a recognized command (while loop of decoding thread in livedecode.c) and start a new one afterwards. Result: the hypstr consists only one command at a time (which is what i want) unluckily the recognition rate /confidence has fallen drastically. Hardly no voice commands are recognized any more.

p.p.s. i have read something about runtime issues with livedecode.c (decoding process wont last longer than a minute).
Is that true and if it is...how can i solve it.