Menu

Relationship between Acoustic/Language Model

Help
Anonymous
2010-05-15
2012-09-22
  • Anonymous

    Anonymous - 2010-05-15

    After doing a fair amount of reading, it is my understanding there are 3
    components to Sphinx recognition:

    • dictionary
    • defines each possible word as a group of sounds (phonemes)
    • language model
    • groups together up to 3 words at a time to define probabilities for sentence recognition
    • acoustic model
    • maps the waveform sound to the phonemes

    If these definitions are correct, then the acoustic model is independent of
    the dictionary & language model. In other words, if you have a complete
    acoustic model, you should be able to use this one model with any
    dictionary/language model. Therefore, what is the best acoustic model to use
    for US english speech (assuming no heavy dialect)?

    Second, does the increasing size of the acoustic model degrade recognition
    performance? In my testing, it's not the size of the acoustic model that
    matters, but the size of the language model that matters. I've used very large
    acoustic models and very large dictionaries, with a small language model, and
    the recognition is faster. This leads me to my question, if I need to
    recognize a limited set of 50 English words, and all numbers between 0 and
    1,000,000, what is the best combination of open source HMM, LM, and Dictionary
    to achieve the best and fastest recognition? The problem is that of the
    predefined set of ~50 english words, some of those words are Product names
    that probably haven't been recorded or trained in any acoustic models. But if
    the acoustic models contain many of the same phonemes that are in these custom
    words, will they still get recognized, or do I need to create an adapted
    acoustic model that trains those additional words?

    Thank You,
    Eric

     
  • Nickolay V. Shmyrev

    Therefore, what is the best acoustic model to use for US english speech
    (assuming no heavy dialect)?

    Acoustic models are trained for specific recording conditions. Model to
    recognize broadcast speech is not suitable for telephone one. There is no such
    thing like best model

    Second, does the increasing size of the acoustic model degrade recognition
    performance?

    Size of the model and recognition accuracy are not related. Size obviously
    affects recognition speed.

    This leads me to my question, if I need to recognize a limited set of 50
    English words, and all numbers between 0 and 1,000,000, what is the best
    combination of open source HMM, LM, and Dictionary to achieve the best and
    fastest recognition?

    It depends on type of speech - telephone/microphone recording/far distance
    recording

    The problem is that of the predefined set of ~50 english words, some of
    those words are Product names that probably haven't been recorded or trained
    in any acoustic models. But if the acoustic models contain many of the same
    phonemes that are in these custom words, will they still get recognized, or do
    I need to create an adapted acoustic model that trains those additional words?

    There is no problem here. Most acoustic models are generic enough and let you
    recognize any word transcribed in the dictionary.

     

Log in to post a comment.