Hi All,
I have 8kHz, 16 bit audio files and feeding the audio files to
sphinx3_livepretend sample code. When I use the acoustic model "communicator
narrowband (8kHz) telephone speech" the recognition results are very poor. But
when I use "HUB4 (broadcast news) acoustic models - for wideband (16kHz)
speech" the recognition is very good. http://www.speech.cs.cmu.edu/sphinx/mod
els/
My question are as follows:
a. For telephony speech which model is the best?
b. How did the recognition work when I used the wrong acoustic models?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi All,
I have 8kHz, 16 bit audio files and feeding the audio files to
sphinx3_livepretend sample code. When I use the acoustic model "communicator
narrowband (8kHz) telephone speech" the recognition results are very poor. But
when I use "HUB4 (broadcast news) acoustic models - for wideband (16kHz)
speech" the recognition is very good. http://www.speech.cs.cmu.edu/sphinx/mod
els/
My question are as follows:
a. For telephony speech which model is the best?
b. How did the recognition work when I used the wrong acoustic models?
Communicator
It's hard to answer this question without the actual audio you were trying to
decode