I was trying to interpret the output acoustic scores given as output for both
Sphinx_3_decode and sphinx_3_align . I am getting some differences between the
behavior of these two type of decoders. Can any one please explain why ?
Hypseg file will contain all the phone segmentations and their respective
acoustic score along with time frames.
Now, when I use this same recognized phoneme sequence to be given as input to
the Sphinx_3_align, so as to get back the time alignment and corresponding
acoustic scores for each phoneme. I got the output as in file ' http://home.i
itb.ac.in/~pranavj/temp/test.phseg'.test.phseg
According to my understanding, the acoustic scores and time alignment given
out by both type of decoders should be same but its not happening and there is
slight differences between the scores . Can someone please explain this
difference or whether I am missing some point in understanding this?
Thanks.
(asked by a friend)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sphinx3_decode applies various language model weights and insertion
probabilities for that reason there is no wonder that time alignment is
slightly different. Even a single weight can shift optimal segmentation by
frame or two. So times are different
If times are different acoustic scores are also different.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I was trying to interpret the output acoustic scores given as output for both
Sphinx_3_decode and sphinx_3_align . I am getting some differences between the
behavior of these two type of decoders. Can any one please explain why ?
I tried to recognize one wave file and got following output files from
Sphinx_3_decode - "h ay n a m ey n' da e s" and test_free_decode.hypseg h
ttp://home.iitb.ac.in/~pranavj/temp/test_free_decode.hypseg.
Hypseg file will contain all the phone segmentations and their respective
acoustic score along with time frames.
Now, when I use this same recognized phoneme sequence to be given as input to
the Sphinx_3_align, so as to get back the time alignment and corresponding
acoustic scores for each phoneme. I got the output as in file ' http://home.i
itb.ac.in/~pranavj/temp/test.phseg'.test.phseg
According to my understanding, the acoustic scores and time alignment given
out by both type of decoders should be same but its not happening and there is
slight differences between the scores . Can someone please explain this
difference or whether I am missing some point in understanding this?
Thanks.
(asked by a friend)
Hello
Sphinx3_decode applies various language model weights and insertion
probabilities for that reason there is no wonder that time alignment is
slightly different. Even a single weight can shift optimal segmentation by
frame or two. So times are different
If times are different acoustic scores are also different.