Menu

What data are used for the acoustic models?

2001-06-18
2012-09-22
  • Brian Delaney

    Brian Delaney - 2001-06-18

    I am doing some experiments to estimate the computational complexity of certain speech recognition tasks using Sphinx 2.

    I am currently trying to acheive some decent recognition performance in terms of WER, but I am
    having some difficulty.  I am using some trigram language models (5k word to 20k word) trained on WSJ data.

    When I test the recognizer using some small set of utterances (40-50) from the WSJ audio data, I get some poor recognition performance.  WER is typically between 90%-110+%. 

    I suspect that the problem might be due to a mismatch in the acoustic models and the test data.
    Does anyone know what acoustic data was used to train the models that are distributed with sphinx II? 

     
    • Kevin A. Lenzo

      Kevin A. Lenzo - 2001-07-10

      Interesting.  The models are from hub 4 (broadcast news).  We should dust off some better models!  Still, though, that sort of error is much more than I'd expect.  Can you contact me at lenzo@cs.cmu.edu about it?  We should get that performance up, or there's a problem.  

       

Log in to post a comment.