Hi;
I try to develop a Turkish speech recognition software for Android platform. For that, I recorded approximately 10 hours of training data in 16 kHz 16 bit mono MSWAV format. I saved the speech files as byte array using AudioRecord object.
I also wrote SHELL scripts to train Turkish HMM and test the system on windows platform using Sphinx toolkit. On Windows everything was as expected and I got a good recognition performance. However, when I try to perform the recognition on Android, I had a slight decrease in the accuracy.
To test the recognition performance on Android, I recorded 100 speech files in the format of the training data (16 kHz 16 bit mono saved as byte array). When I recognize the original files in Windows using the SHELL code, I obtained %97 recognition accuracy for a basic FSG recognition task. However, when I send the speech files under the Sphinx sync folder to the recognition, the accuracy dropped to %94 for the same task with the same 100 test files.
I listened the speech files in the sync folder, and realized that the files are noisier when compared to the original files. Below I attached an original file and a corresponding file created under Sphinx’s sync folder. In my code, I only convert byte array to short array using the following method and send the short data to the decoder’s processRaw method;
//convert byte to short
private short[] byte2short(byte[] byteD) {
int byteArrsize = byteD.length / 2;
short[] shorts = new short[byteArrsize];
for (int i = 0; i < byteArrsize; i++) {
shorts[i] = (short) (byteD[i * 2] + (byteD[(i * 2) + 1] << 8));
}
return shorts;
}
We think that the performance might be decreasing due to the added noise. Could you please suggest me a way to improve the recognition performance?
Thanks in advance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
could you please tell me how should i convert my recorded speech in byte array to short? I need to do this conversion since decoder's processRaw takes a short array as an input.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi;
I try to develop a Turkish speech recognition software for Android platform. For that, I recorded approximately 10 hours of training data in 16 kHz 16 bit mono MSWAV format. I saved the speech files as byte array using AudioRecord object.
I also wrote SHELL scripts to train Turkish HMM and test the system on windows platform using Sphinx toolkit. On Windows everything was as expected and I got a good recognition performance. However, when I try to perform the recognition on Android, I had a slight decrease in the accuracy.
To test the recognition performance on Android, I recorded 100 speech files in the format of the training data (16 kHz 16 bit mono saved as byte array). When I recognize the original files in Windows using the SHELL code, I obtained %97 recognition accuracy for a basic FSG recognition task. However, when I send the speech files under the Sphinx sync folder to the recognition, the accuracy dropped to %94 for the same task with the same 100 test files.
I listened the speech files in the sync folder, and realized that the files are noisier when compared to the original files. Below I attached an original file and a corresponding file created under Sphinx’s sync folder. In my code, I only convert byte array to short array using the following method and send the short data to the decoder’s processRaw method;
//convert byte to short
private short[] byte2short(byte[] byteD) {
int byteArrsize = byteD.length / 2;
short[] shorts = new short[byteArrsize];
for (int i = 0; i < byteArrsize; i++) {
shorts[i] = (short) (byteD[i * 2] + (byteD[(i * 2) + 1] << 8));
}
return shorts;
}
We think that the performance might be decreasing due to the added noise. Could you please suggest me a way to improve the recognition performance?
Thanks in advance.
Here is ny files:
Last edit: Sevcan Kahraman 2015-08-03
could you please tell me how should i convert my recorded speech in byte array to short? I need to do this conversion since decoder's processRaw takes a short array as an input.
The correct expression to convert two bytes to short is:
It different from your code in several aspects. It casts to int with & operator and converts to unsigned byte within int.
And you can also use bytebuffer:
http://stackoverflow.com/a/5626003/432021
Thank you Nickolay, ByteBuffer worked for me! Thanks a lot.
ByteBuffer.wrap(byteD).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(shorts);