Menu

phone-level time information after recognize

Help
CAO.Z.H
2008-01-24
2012-09-22
  • CAO.Z.H

    CAO.Z.H - 2008-01-24

    After recognize, seems SPHINX recognizer only provides word-level start-frame and end-frame information, as below:
    FV:Track15735_3294-1> WORD SFrm EFrm AScr(UnNorm) LMScore AScr+LScr AScale
    fv:Track15735_3294-1> <sil> 0 50 78 -72912 1529608 1935275
    fv:Track15735_3294-1> MENU 51 129 63 -50958 -634465 373939
    fv:Track15735_3294-1> <sil> 130 179 90 -72912 3037752 3245717
    FV:Track15735_3294-1> TOTAL 4129677 -196782
    How can i get phone-level start-frame and end-frmae information?

    Thanks:)

     
    • David Huggins-Daines

      Hi,

      It would slow down the decoder considerably to keep track of phone segmentation information during search, so you can't get it from the recognizer itself. What you have to do is run a second pass of force-alignment (using sphinx3_align) on the utterance using the recognized transcript, and ask it to output a phoneme segmentation (-phsegdir flag).

      At some unspecified point in the future we might add the ability to do this to the decoder, because it is useful for a lot of things including minimum-phone-error training.

       
      • CAO.Z.H

        CAO.Z.H - 2008-01-25

        Thanks :)
        Very clear answer!

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.