We are doing some batch processing of a database of audio files. We have set Sphinx up using a trigram SLM file, and the config files are based on the an4_words_trigram. When we run Sphinx it returns extremely poor and what to be some what random results, even when we use a slm that contains only the sentence we are testing.
The format for the audio files that are used are PCM 16bit signed, bigEndian, mono, 16k.
Any suggestions on what could be causing these poor results?
We suspect that is something to do with how the recogniser is handling the audio.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok, I have solved the audio problem. Since, I am using a Windows Operating System the native format for audio files is littleEndian, so trying to edit and batch process bigEndian audio files was causing problems. So, I changed all the audio files to littleEndian and change the sphinx4 config file around so it would pick up littleEndian (using: <property name="bigEndianData" value="false"/> ). I then ran the test again, using the same SLM that had the one sentence in it that we were testing, the results were not great but it seemed to be functioning, the word accuracy rate was around 85%.
I then retested with the .arpa file I intend to use, and got a work accuracy rate of less then 10%! I received around 80% work accuracy rate, using the same .arpa file another recognizer called Sonic. Surely, Sphinx4 cant be that bad and I am setting up something incorrect in my config file.
Any ideas?
Cheers!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We are doing some batch processing of a database of audio files. We have set Sphinx up using a trigram SLM file, and the config files are based on the an4_words_trigram. When we run Sphinx it returns extremely poor and what to be some what random results, even when we use a slm that contains only the sentence we are testing.
The format for the audio files that are used are PCM 16bit signed, bigEndian, mono, 16k.
Any suggestions on what could be causing these poor results?
We suspect that is something to do with how the recogniser is handling the audio.
Ok, I have solved the audio problem. Since, I am using a Windows Operating System the native format for audio files is littleEndian, so trying to edit and batch process bigEndian audio files was causing problems. So, I changed all the audio files to littleEndian and change the sphinx4 config file around so it would pick up littleEndian (using: <property name="bigEndianData" value="false"/> ). I then ran the test again, using the same SLM that had the one sentence in it that we were testing, the results were not great but it seemed to be functioning, the word accuracy rate was around 85%.
I then retested with the .arpa file I intend to use, and got a work accuracy rate of less then 10%! I received around 80% work accuracy rate, using the same .arpa file another recognizer called Sonic. Surely, Sphinx4 cant be that bad and I am setting up something incorrect in my config file.
Any ideas?
Cheers!
bigEndian? Usually data is littleEndian and I suppoe it's the default.
I have converted some of the audio files to littleEngian using GoldWave and I am still getting the same poor results.
Any more suggestions?