I'm having trouble getting Sphinx4 to handle my live audio data. I have an 8kHz 16-bit PCM-signed little-endian audio stream coming from my telephone switch and I need to have voice-recognition on the fly. I've tried to create a new frontend based on the Microphone class with no success--the accuracy is so bad I'm sure it's due to a sampling mismatch. When I capture audio from my phone line, the resulting wave file can be decoded correctly using the BatchModeRecognizer so what am I doing wrong?
What is the recommended deal with this sort of application--a live audio stream not coming from a mic?
TIA
Zac
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm suspecting that this is an endian problem. In StreamDataSource it will try to convert little-endian data to big-endian. As you will see, in the StreamDataSource.readNextFrame() method, there is the following code:
You should insert something like the above at the appropriate place (probably RecordingThread.readData() method) in your custom Microphone. Let me know if it works for you.
The Microphone class doesn't do this because Java is big-endian, so you should be able to get an audio line that is big-endian. Though, just to be sure, I'll fix Microphone so that it will do a check on the final AudioFormat, and perform big-endian conversion if necessary.
In any case, once you've confirmed that the endian conversion is correct, and if it still doesn't recognize correctly, please send me your config.xml file, as well as your custom Microphone class. I will try my best to figure out what's wrong. Good luck.
philip
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm having trouble getting Sphinx4 to handle my live audio data. I have an 8kHz 16-bit PCM-signed little-endian audio stream coming from my telephone switch and I need to have voice-recognition on the fly. I've tried to create a new frontend based on the Microphone class with no success--the accuracy is so bad I'm sure it's due to a sampling mismatch. When I capture audio from my phone line, the resulting wave file can be decoded correctly using the BatchModeRecognizer so what am I doing wrong?
What is the recommended deal with this sort of application--a live audio stream not coming from a mic?
TIA
Zac
Zac,
I'm suspecting that this is an endian problem. In StreamDataSource it will try to convert little-endian data to big-endian. As you will see, in the StreamDataSource.readNextFrame() method, there is the following code:
if (bigEndian) {
doubleData = DataUtil.bytesToValues(...);
} else {
doubleData = DataUtil.littleEndianBytesToValues(...);
}
You should insert something like the above at the appropriate place (probably RecordingThread.readData() method) in your custom Microphone. Let me know if it works for you.
The Microphone class doesn't do this because Java is big-endian, so you should be able to get an audio line that is big-endian. Though, just to be sure, I'll fix Microphone so that it will do a check on the final AudioFormat, and perform big-endian conversion if necessary.
In any case, once you've confirmed that the endian conversion is correct, and if it still doesn't recognize correctly, please send me your config.xml file, as well as your custom Microphone class. I will try my best to figure out what's wrong. Good luck.
philip