I have used steps/align_fmllr to get the phone alignment. Now I want to know the likelihood score for each phone in this alignment. For example, if the utterance is "apple", the alignment should be "a p l", I find the alignment file only contains transition ID and corresponding phoneme, there is no likelihood score for these phones. Any suggestions will be appreciated!
Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is a way to get the frame-by-frame acoustic scores, but it is
slightly complicated. You could convert the alignment and the
associated text into a linear lattice using the program
linear-to-nbest (you would have to use empty archives '' for the graph
and acoustic scores because you don't have them), and then use
gmm-rescore-lattice with appropriate arguments to get a suitably
rescored lattice; then you could maybe write a script to extract the
frame-by-frame acoustic scores from the lattice.
The reason this isn't made easier is that there is very rarely a
legitimate need to do something with these frame-by-frame scores.
People who are unfamiliar with speech recognition sometimes assume
that they contain some useful information and might be somehow
correlated with the confidence of the system in the classification of
each frame, but that is not the case.
I have used steps/align_fmllr to get the phone alignment. Now I want to know
the likelihood score for each phone in this alignment. For example, if the
utterance is "apple", the alignment should be "a p l", I find the alignment
file only contains transition ID and corresponding phoneme, there is no
likelihood score for these phones. Any suggestions will be appreciated!
Thanks
how to get likelihood score for each phoneme given alignment
Dear All,
I have used steps/align_fmllr to get the phone alignment. Now I want to know the likelihood score for each phone in this alignment. For example, if the utterance is "apple", the alignment should be "a p l", I find the alignment file only contains transition ID and corresponding phoneme, there is no likelihood score for these phones. Any suggestions will be appreciated!
Thanks
There is a way to get the frame-by-frame acoustic scores, but it is
slightly complicated. You could convert the alignment and the
associated text into a linear lattice using the program
linear-to-nbest (you would have to use empty archives '' for the graph
and acoustic scores because you don't have them), and then use
gmm-rescore-lattice with appropriate arguments to get a suitably
rescored lattice; then you could maybe write a script to extract the
frame-by-frame acoustic scores from the lattice.
The reason this isn't made easier is that there is very rarely a
legitimate need to do something with these frame-by-frame scores.
People who are unfamiliar with speech recognition sometimes assume
that they contain some useful information and might be somehow
correlated with the confidence of the system in the classification of
each frame, but that is not the case.
Dan
On Thu, Jun 25, 2015 at 11:22 PM, Lee speechspeech@users.sf.net wrote:
OK, I got it, thanks