I run sphinx 4 on 8Khz wave filed download with the demo as well as wave file
downloaded from voxforge, the WER is 10% with ngram language model.
But I recorded a wave file from asterisk server (since it is through VoIP and
sampling frequency is 8KHz), and let sphinx4 to do transcript, the WER is over
80%. Is hat becomes of the acoustic model of 8KHz is only good for microphone
recorded file and not good for VoIP phone system recorded file?
Or I did something wrong or I need to do some conversion on the recorded file?
Please advice.
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Most likely your audio files have wrong format. It might be stereo instead of
mono, big endian instead of little endian or mu-law instead of pcm. Check
everything again carefully.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I run sphinx 4 on 8Khz wave filed download with the demo as well as wave file
downloaded from voxforge, the WER is 10% with ngram language model.
But I recorded a wave file from asterisk server (since it is through VoIP and
sampling frequency is 8KHz), and let sphinx4 to do transcript, the WER is over
80%. Is hat becomes of the acoustic model of 8KHz is only good for microphone
recorded file and not good for VoIP phone system recorded file?
Or I did something wrong or I need to do some conversion on the recorded file?
Please advice.
Thanks.
Most likely your audio files have wrong format. It might be stereo instead of
mono, big endian instead of little endian or mu-law instead of pcm. Check
everything again carefully.