Menu

How is the search graph created when decoding phones using triphone acoustic models?

dovark
2013-04-03
2013-04-03
  • dovark

    dovark - 2013-04-03

    Hi,

    I'm curious about following question.

    We want to do phone recognition. We have N triphone HMM models available (N could be of the order of 50^3) and we create a unigram phone language model (say of 50 phones).

    During search, (theoretically) should the language model be expanded to 50^3 possible paths? Because otherwise the triphone models will not be utilized.

    If yes, is this actually done practically also (say in sphinx/htk)?

     

    Last edit: dovark 2013-04-03
  • The Grand Janitor

    Woo... Juicy Question. ;) (Juicy ?)

    Short answer:

    HTK: yes. There is a flag in HVite which allows full expansion of a single phone word. So in phoneme recognition, you would find that HVite would slow down tremendously.

    Btw, if I remember correctly, HTK also has optional flag for silence expansion. For trivial reason, you might not want to expand.

    Sphinx: in allphone mode of sphinx3, a full expansion was also done for triphone. Unfortunately I don't know s4/ps enough to give you answers on them. I will leave them to other experts.

    Arthur

     
  • dovark

    dovark - 2013-04-03

    Thanks Arthur.

    The same problem also arises with word LM I think. Since last phoneme of a word can have any of the other (say K) possibilities of next phones, would acoustic scores of all K paths be separately computed and stored?

     
  • The Grand Janitor

    Yes. In HTK, again you can fully expand it. In many other recognizers, you can find dozens of different implementations. It's beyond to give a full account, I will just give you some examples, throw out some jargons without getting into detail,

    • In Sphinx3 mode=flat left context is fully expanded, whereas right context is approximated by multiplexed triphones
    • In Sphinx3 mode=tree composite triphones.

    Of course, many sophisticated recognizer would also use a 2-stage paradigm, first by generating lattice, then do the full triphone expansion on the lattice 2nd-stage.

    Arthur

     

Log in to post a comment.