CMU Sphinx / Forums / Help: pocketsphinx improving recognition accuracy

Sevcan Kahraman - 2015-08-03

Hi;
I try to develop a Turkish speech recognition software for Android platform. For that, I recorded approximately 10 hours of training data in 16 kHz 16 bit mono MSWAV format. I saved the speech files as byte array using AudioRecord object.
I also wrote SHELL scripts to train Turkish HMM and test the system on windows platform using Sphinx toolkit. On Windows everything was as expected and I got a good recognition performance. However, when I try to perform the recognition on Android, I had a slight decrease in the accuracy.
To test the recognition performance on Android, I recorded 100 speech files in the format of the training data (16 kHz 16 bit mono saved as byte array). When I recognize the original files in Windows using the SHELL code, I obtained %97 recognition accuracy for a basic FSG recognition task. However, when I send the speech files under the Sphinx sync folder to the recognition, the accuracy dropped to %94 for the same task with the same 100 test files.
I listened the speech files in the sync folder, and realized that the files are noisier when compared to the original files. Below I attached an original file and a corresponding file created under Sphinx’s sync folder. In my code, I only convert byte array to short array using the following method and send the short data to the decoder’s processRaw method;
//convert byte to short
private short[] byte2short(byte[] byteD) {
int byteArrsize = byteD.length / 2;
short[] shorts = new short[byteArrsize];
for (int i = 0; i < byteArrsize; i++) {
shorts[i] = (short) (byteD[i * 2] + (byteD[(i * 2) + 1] << 8));
}
return shorts;
}
We think that the performance might be decreasing due to the added noise. Could you please suggest me a way to improve the recognition performance?
Thanks in advance.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sevcan Kahraman - 2015-08-03

Here is ny files:

Last edit: Sevcan Kahraman 2015-08-03

FileInSyncFolderAfterProcessRaw.raw

originalFileByte.wav

shortDataBeforeSendingDecoder.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sevcan Kahraman - 2015-08-03

could you please tell me how should i convert my recorded speech in byte array to short? I need to do this conversion since decoder's processRaw takes a short array as an input.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-08-03
  
  The correct expression to convert two bytes to short is:
  
  short val=(short)((lo & 0xff) | ((hi & 0xff) << 8));
  
  It different from your code in several aspects. It casts to int with & operator and converts to unsigned byte within int.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2015-08-03

And you can also use bytebuffer:

http://stackoverflow.com/a/5626003/432021

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sevcan Kahraman - 2015-08-03

Thank you Nickolay, ByteBuffer worked for me! Thanks a lot.
ByteBuffer.wrap(byteD).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(shorts);

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

pocketsphinx improving recognition accuracy

Speech Recognition Toolkit

Forums

Help

pocketsphinx improving recognition accuracy document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

pocketsphinx improving recognition accuracy