Menu

Sphinx Language Model --> Acoustic Mo...

Help
Anonymous
2010-05-15
2012-09-22
  • Anonymous

    Anonymous - 2010-05-15

    Hello,
    I have a fairly simple question that I cannot find the answer to. Does your
    Acoustic model HAVE to contain the exact spoken utterances that your language
    model dictionary defines? In other words, if I need the word:
    commodity
    in my language model (so that it can be recognized), does the associated
    acoustic model need to be recorded with that word somewhere in it, OR, will
    the speech recognition system recognize 'commodity' based on its similarity to
    other word(s) that are in the acoustic model that were similar to 'commodity'?

    Other questions I have are:
    - what matters most in terms of speech recognition speed -- acoustic model size, language model size, or both? In other words, if I want the fastest recognition time possible, should my language model just contain the words I need, or should I use a much larger pre-defined (open source) language model? Same goes for the acoustic model; would I get faster recognition with an acoustic model that only had the words I need?

    • one can use lmtool to generate a smaller language model dict/grammar, however, what tools would I use to develop my own acoustic models (based on the limited vocabulary I need)? does it make sense to create my own acoustic model and train it myself in order to get higher accuracy for the limited vocabulary I need (i.e. commodity trading terms)?

    Thank You,
    Eric

     
  • Nickolay V. Shmyrev

    . Does your Acoustic model HAVE to contain the exact spoken utterances that
    your language model dictionary defines?

    Acoustic model describes phone pronunciations, it's unrelated to words. It's
    incorrect to write that "acoustic model contains spoken utterances"

    does the associated acoustic model need to be recorded with that word
    somewhere in it

    no

    • what matters most in terms of speech recognition speed -- acoustic model
      size, language model size, or both? I

    both

    I want the fastest recognition time possible, should my language model just
    contain the words I need, or should I use a much larger pre-defined (open
    source) language model?

    If language model is small recognition is faster

    Same goes for the acoustic model; would I get faster recognition with an
    acoustic model that only had the words I need?

    Acoustic model doesn't contain any words, see above.

    what tools would I use to develop my own acoustic models (based on the
    limited vocabulary I need)?

    You don't need to develop any acoustic models, but if you still want, you can
    use sphinxtrain distributed here

    does it make sense to create my own acoustic model and train it myself in
    order to get higher accuracy for the limited vocabulary I need (i.e. commodity
    trading terms)?

    It makes sense sometimes, but you should better avoid that

    P.S. Avoid crossposting on various forums - voxforge, here, somewhere else.

     
  • vkumar

    vkumar - 2010-08-19

    Hello,
    At the time of training Acoustic Model it is suggested that we should train it
    using large utterances in different environments, different peoples etc.....
    It makes sense that it induces variability (adapts).
    Suppose, if I took 40 words spoken in 80 utterance files, by man (40
    utterance) and by child (40 utterances) a single word spoken twice by man and
    child. Their phonetic dictionary uses 40 phone ARPAbet US English same as used
    by CMU Sphinx, to make their pronunciations.
    My question is,
    -does acoustic model is training words by breaking them phonetically.
    - What are properties of an good Acoustic Model
    -Do i need to train acoustic model, for phonemes separately like, AA, AE, EH, IH.....
    -While doing above experiment, Does MFCC is calculated by taking one word utterance(file) at a time separately and what about means , variances, mixture_weights, transition_matrices
    -What does transition matrices have??
    -Is it an kind of signature kept for identification of word or phoneme
    -The HUB4 Acoustic Model and Language Model available for download at CMU.SourceForge can it be used for general purpose speech recognition, like, can it recognize my utterance, batch-mode, recorded in an standard home environment on my LAPTOP.

     
  • Nickolay V. Shmyrev

    -does acoustic model is training words by breaking them phonetically.

    No

    • What are properties of an good Acoustic Model

    It recognizes speech with good accuracy

    -Do i need to train acoustic model, for phonemes separately like, AA, AE,
    EH, IH.....

    No

    -While doing above experiment, Does MFCC is calculated by taking one word
    utterance(file) at a time separately and what about means , variances,
    mixture_weights, transition_matrices

    What about that

    -What does transition matrices have??

    Transition matrices hold transition probabilities between HMM states

    -Is it an kind of signature kept for identification of word or phoneme

    No

    -The HUB4 Acoustic Model and Language Model available for download at
    CMU.SourceForge can it be used for general purpose speech recognition, like,
    can it recognize my utterance, batch-mode, recorded in an standard home
    environment on my LAPTOP.

    Yes

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.