I can always find new triphones in the final .mdef file that have never been
seen in the training data (untied.mdef). the number of triphones is always
higher in the final mdef file. I need to restrict the triphone models to be
only for those which have been seen in the training data. for the unforseen
triphones, the decoder should backoff to the CI models. my questions: why the
trainer propose new triphones? is it possible to avoid that?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It collects all possible triphones from your dictionary. If you configure it
to use too many triphones it will pick some additional ones.
The added triphones to the final mdef file do not exist in any single word in
the dictionary as well. Does the trainer consider the combination of any two
words in the dictionary?
Use less tied states in training configuration
less tied states doesn't have any effect on the number of triphones. it
affects only the number of shared states!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Does the trainer consider the combination of any two words in the
dictionary?
Yes, it should consider cross-word triphones. You can see it in position
field. However, I think it does the right think by assigning tied states with
decision tree, not falling back to CI. It seems more reasonable.
less tied states doesn't have any effect on the number of triphones. it
affects only the number of shared states!
Right, i was confused initially. I'd like to note that trainer doesnt "train
new triphones". It trains only tied states parameters which has been seen in
the data.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I can always find new triphones in the final .mdef file that have never been
seen in the training data (untied.mdef). the number of triphones is always
higher in the final mdef file. I need to restrict the triphone models to be
only for those which have been seen in the training data. for the unforseen
triphones, the decoder should backoff to the CI models. my questions: why the
trainer propose new triphones? is it possible to avoid that?
It collects all possible triphones from your dictionary. If you configure it
to use too many triphones it will pick some additional ones.
Use less tied states in training configuration
The added triphones to the final mdef file do not exist in any single word in
the dictionary as well. Does the trainer consider the combination of any two
words in the dictionary?
less tied states doesn't have any effect on the number of triphones. it
affects only the number of shared states!
Yes, it should consider cross-word triphones. You can see it in position
field. However, I think it does the right think by assigning tied states with
decision tree, not falling back to CI. It seems more reasonable.
Right, i was confused initially. I'd like to note that trainer doesnt "train
new triphones". It trains only tied states parameters which has been seen in
the data.