I have been trying to understand the use of HMM in acoustic modelling. From what I read, I can see that for large vocabulary speech recognition requirements, triphone HMMs are used with senones as states of the HMM.
Is my understanding correct?
If yes, here is my further question.
Given that a HMM can model multiple state transitions between the 3 senones, is it that one HMM be used to represent multiple triphones? I am imagining that multiple state transition possibilities can result in multiple triphones being represented by the same HMM.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Given that a HMM can model multiple state transitions between the 3 senones, is it that one HMM be used to represent multiple triphones?
Yes, it is possible. For example triphones like B-A-S and B-A-Z might be represented by the same senone sequence since their acoustic properties are simialr.
I am imagining that multiple state transition possibilities can result in multiple triphones being represented by the same HMM.
Transitions are shared for triphones of the same base phone so if triphone is represented by the same senone sequence, the transitions between states are the same.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a further question.
If same senone sequence from a HMM can be used to model multiple triphones,can different senones sequences from a HMM too be used multiple triphones?
For example if senone sequence S1--> S2--> S3 of a HMM is used to model Triphone1,
can S1-->S1-->S2 from from the same HMM be used to model Triphone2?
If yes, can you share an example?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The point is, we're actually performing a search over a graph of
words, and the words are composed of triphones. So having identical
models for two triphones doesn't really matter that much since the
word models will be different anyway. Sometimes the word models too
will be identical (e.g. two words with the same pronunciation), in
which case we depend on the structure of the word graph (which is
given by the language model) to disambiguate.
I have a further question.
If same senone sequence from a HMM can be used to model multiple
triphones,can different senones sequences from a HMM too be used multiple
triphones?
For example if senone sequence S1--> S2--> S3 of a HMM is used to model
Triphone1,
can S1-->S1-->S2 from from the same HMM be used to model Triphone2?
I have been trying to understand the use of HMM in acoustic modelling. From what I read, I can see that for large vocabulary speech recognition requirements, triphone HMMs are used with senones as states of the HMM.
Is my understanding correct?
If yes, here is my further question.
Given that a HMM can model multiple state transitions between the 3 senones, is it that one HMM be used to represent multiple triphones? I am imagining that multiple state transition possibilities can result in multiple triphones being represented by the same HMM.
Yes
Yes, it is possible. For example triphones like B-A-S and B-A-Z might be represented by the same senone sequence since their acoustic properties are simialr.
Transitions are shared for triphones of the same base phone so if triphone is represented by the same senone sequence, the transitions between states are the same.
Hello Nickolay- Thanks for clarifying.
I have a further question.
If same senone sequence from a HMM can be used to model multiple triphones,can different senones sequences from a HMM too be used multiple triphones?
For example if senone sequence S1--> S2--> S3 of a HMM is used to model Triphone1,
can S1-->S1-->S2 from from the same HMM be used to model Triphone2?
If yes, can you share an example?
Yes
You can find many examples in mdef file in any acoustic model
Sure. Happens all the time.
The point is, we're actually performing a search over a graph of
words, and the words are composed of triphones. So having identical
models for two triphones doesn't really matter that much since the
word models will be different anyway. Sometimes the word models too
will be identical (e.g. two words with the same pronunciation), in
which case we depend on the structure of the word graph (which is
given by the language model) to disambiguate.
-Bhiksha
On Thu, Jul 31, 2014 at 12:28 AM, Vamsi vamsiev@users.sf.net wrote:
--
Bhiksha Raj
Carnegie Mellon University
Pittsburgh, PA, USA
Tel: 412 268 9826
Nicholay/Bishaka- Thank you! That helped.