After recognize, seems SPHINX recognizer only provides word-level start-frame and end-frame information, as below:
FV:Track15735_3294-1> WORD SFrm EFrm AScr(UnNorm) LMScore AScr+LScr AScale
fv:Track15735_3294-1> <sil> 0 50 78 -72912 1529608 1935275
fv:Track15735_3294-1> MENU 51 129 63 -50958 -634465 373939
fv:Track15735_3294-1> <sil> 130 179 90 -72912 3037752 3245717
FV:Track15735_3294-1> TOTAL 4129677 -196782
How can i get phone-level start-frame and end-frmae information?
Thanks:)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It would slow down the decoder considerably to keep track of phone segmentation information during search, so you can't get it from the recognizer itself. What you have to do is run a second pass of force-alignment (using sphinx3_align) on the utterance using the recognized transcript, and ask it to output a phoneme segmentation (-phsegdir flag).
At some unspecified point in the future we might add the ability to do this to the decoder, because it is useful for a lot of things including minimum-phone-error training.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
After recognize, seems SPHINX recognizer only provides word-level start-frame and end-frame information, as below:
FV:Track15735_3294-1> WORD SFrm EFrm AScr(UnNorm) LMScore AScr+LScr AScale
fv:Track15735_3294-1> <sil> 0 50 78 -72912 1529608 1935275
fv:Track15735_3294-1> MENU 51 129 63 -50958 -634465 373939
fv:Track15735_3294-1> <sil> 130 179 90 -72912 3037752 3245717
FV:Track15735_3294-1> TOTAL 4129677 -196782
How can i get phone-level start-frame and end-frmae information?
Thanks:)
Hi,
It would slow down the decoder considerably to keep track of phone segmentation information during search, so you can't get it from the recognizer itself. What you have to do is run a second pass of force-alignment (using sphinx3_align) on the utterance using the recognized transcript, and ask it to output a phoneme segmentation (-phsegdir flag).
At some unspecified point in the future we might add the ability to do this to the decoder, because it is useful for a lot of things including minimum-phone-error training.
Thanks :)
Very clear answer!