Stefanie Tellex - 2007-05-16

Hi,

I'm experiencing a strange problem when I try to batch recognize a large wav file. It's not sending out separate results for separate utterances. In one case, I have a file that contains two sentences: "touch the red circle" and "touch the green circle". There is about four seconds of silence between the two utterances (one lip smack three seconds into the silence, then another second of silence, then the second utterance.) The hypothesis for this example is "touch the red circle and touch the green circle." Is there a parameter I can tweak to make it accept a result as final? It seems strange that it doesn't send out a result after that amount of time.

In another case, I'm trying to recognize a 7 minute wav file, and it reports the entire contents of the file as one long result, rather than many separate results.

For a while I had this problem because I was only calling the "recognize" method once after loading the file, but now I'm calling it until the I get the same (empty) result twice. Is there a better way to tell when it's finished decoding a file?

Finally, is there code somewhere that reports word error rate for a transcribed audio file with loose time alignments (at the utterance level)? I used BatchModeRecognizer to have a test cases for a bunch of short audio files, but I'd like to test it with a longer audio file with multiple utterances, to more directly simulate what happens when doing live recognition with a microphone. It seems like it wouldn't be too hard to write something that does this using accuracy tracker and checking the start and end frames on the result.

Thanks,

Stefanie