What data are used for the acoustic models?

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

What data are used for the acoustic models?

Forum: Speech Recognition Theory

Creator: Brian Delaney

Created: 2001-06-18

Updated: 2012-09-22

Brian Delaney - 2001-06-18

I am doing some experiments to estimate the computational complexity of certain speech recognition tasks using Sphinx 2.

I am currently trying to acheive some decent recognition performance in terms of WER, but I am
having some difficulty. I am using some trigram language models (5k word to 20k word) trained on WSJ data.

When I test the recognizer using some small set of utterances (40-50) from the WSJ audio data, I get some poor recognition performance. WER is typically between 90%-110+%.

I suspect that the problem might be due to a mismatch in the acoustic models and the test data.
Does anyone know what acoustic data was used to train the models that are distributed with sphinx II?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Kevin A. Lenzo - 2001-07-10
  
  Interesting. The models are from hub 4 (broadcast news). We should dust off some better models! Still, though, that sort of error is much more than I'd expect. Can you contact me at lenzo@cs.cmu.edu about it? We should get that performance up, or there's a problem.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.