Menu

pocketsphinx improving recognition accuracy

Help
2015-08-03
2015-08-03
  • Sevcan Kahraman

    Sevcan Kahraman - 2015-08-03

    Hi;
    I try to develop a Turkish speech recognition software for Android platform. For that, I recorded approximately 10 hours of training data in 16 kHz 16 bit mono MSWAV format. I saved the speech files as byte array using AudioRecord object.
    I also wrote SHELL scripts to train Turkish HMM and test the system on windows platform using Sphinx toolkit. On Windows everything was as expected and I got a good recognition performance. However, when I try to perform the recognition on Android, I had a slight decrease in the accuracy.
    To test the recognition performance on Android, I recorded 100 speech files in the format of the training data (16 kHz 16 bit mono saved as byte array). When I recognize the original files in Windows using the SHELL code, I obtained %97 recognition accuracy for a basic FSG recognition task. However, when I send the speech files under the Sphinx sync folder to the recognition, the accuracy dropped to %94 for the same task with the same 100 test files.
    I listened the speech files in the sync folder, and realized that the files are noisier when compared to the original files. Below I attached an original file and a corresponding file created under Sphinx’s sync folder. In my code, I only convert byte array to short array using the following method and send the short data to the decoder’s processRaw method;
    //convert byte to short
    private short[] byte2short(byte[] byteD) {
    int byteArrsize = byteD.length / 2;
    short[] shorts = new short[byteArrsize];
    for (int i = 0; i < byteArrsize; i++) {
    shorts[i] = (short) (byteD[i * 2] + (byteD[(i * 2) + 1] << 8));
    }
    return shorts;
    }
    We think that the performance might be decreasing due to the added noise. Could you please suggest me a way to improve the recognition performance?
    Thanks in advance.

     
  • Sevcan Kahraman

    Sevcan Kahraman - 2015-08-03

    could you please tell me how should i convert my recorded speech in byte array to short? I need to do this conversion since decoder's processRaw takes a short array as an input.

     
    • Nickolay V. Shmyrev

      The correct expression to convert two bytes to short is:

           short val=(short)((lo & 0xff) | ((hi & 0xff) << 8));
      

      It different from your code in several aspects. It casts to int with & operator and converts to unsigned byte within int.

       
  • Nickolay V. Shmyrev

    And you can also use bytebuffer:

    http://stackoverflow.com/a/5626003/432021

     
  • Sevcan Kahraman

    Sevcan Kahraman - 2015-08-03

    Thank you Nickolay, ByteBuffer worked for me! Thanks a lot.
    ByteBuffer.wrap(byteD).order(ByteOrder.LITTLE_ENDIAN).asShortBuffer().get(shorts);

     

Log in to post a comment.