Menu

HMM-ANN hybrid

2011-05-31
2012-09-22
  • hindiasradmin

    hindiasradmin - 2011-05-31

    I was reading this paper - "Spoken term detection based on the most probable
    phoneme sequence", Gosztolya, G.; Toth, L.; SAMI 2011.
    On its page no 4, this
    paper states - "The acoustic models may be further refined by using more
    sophisticated machine learning techniques. One possibility is to apply
    Artificial Neural Nets (ANNs) to estimate the local probability values instead
    of Gaussian curves. The result- ing construct is called the HMM/ANN hybrid .
    Thanks to the dvantages of ANNs, a 1-state monophone hybrid model can produce
    just as good a accuracy score as a standard 3- state triphone HMM."

    Is this true ? Because ANN based speech recognition systems are outdated now.
    I have seen nobody using them. I was told that HMM are best performer. And if
    above statement is true then why nobody uses ANN based ASR or builds ANN based
    toolkits now a days ?

     
  • Nickolay V. Shmyrev

    I have seen nobody using them. I was told that HMM are best performer.

    That's not true. If you will search latest conference proceedings you'll see a
    lot of papers about MLP features, that's basically ANN. Many decoders use it,
    for example RWTH-ASR or Kaldi. Moreover, recently introduced deep belief
    networks which are multi-layer ANN are known to provide best phonetic
    accuracy.

    The issue with ANN is how to adapt the model to the speaker but it has some
    solutions. Anyway, as phonetic classifier this approach is quite successful.

     
  • Vassil Panayotov

    Just a note on the terminology ...
    In the phonetic recognition field they make distinction between phonetic
    'classification', where the phone boundaries are known and 'recognition' where
    they are not.
    See for example: http://groups.google.com/group/phnrec/msg/356dce67789f2c08

     
  • Rmkf

    Rmkf - 2011-06-03

    But why they not use ANN classifiers on articulatory (log area) vector instead
    or better in addition to acoustic one? Most of phones have 1 or 2 main
    articulation points (constrictions) which is enough to distinguish it from
    others. Moreover - it works well even in cases with missed/coupled formants!

     

Log in to post a comment.