Menu

alignment

Anonymous
2000-12-19
2012-09-22
  • Anonymous

    Anonymous - 2000-12-19

    hi, i am trying to use sphinx2-align to do phone alignment
    on a set of audio sentences i've recorded.  if i am not
    mistaken, the output of the aligner seems to be in units of "frames". so for the following example, the middle SIL
    lasts from frames 478 to 511...

    my question: is it at all possible to provide output
    that is based on msec  rather than frames? if so, how?
    if not, is there a simple way to convert between the
    output provided to the timing in msec? (ie, 256 samples
    per frame, etc)?

    thanks in advance
    --tony

               Phone  Beg  End Acoustic Score
                     SIL    0  197    -42977523
                     SIL  198  466    -47576466
              L(SIL,IY)b  467  470     -1560823
                 IY(L,F)  471  473      -812794
              F(IY,SIL)e  474  477     -1207767
                     SIL  478  511     -6958443
              M(SIL,EY)b  512  517     -1698059
                 EY(M,T)  518  520     -1356272
              T(EY,SIL)e  521  524      -943199
                     SIL  525 1211   -149509886
                     SIL 1212 1363    -3405662

     
    • Eric Herrmann

      Eric Herrmann - 2000-12-21

      Each frame # is a certain # of ms. I am not positive, but from observing the results I think it is 1000 samples per frame, so at 16KHz that's 6.25ms per frame. See if that jives with your observations.

       
      • Kevin A. Lenzo

        Kevin A. Lenzo - 2000-12-21

        Yes, it was giving results at 0.00625 seconds per increment.  Actually the frames are about 410 samples padded to 512 at 16kHz, but they are overlapped, giving that time.

        However, that was a bug.  I just checked in uttproc.c with a fix that makes it the (proper) 0.01 seconds (ten milliseconds) per increment.  This makes things faster and more accurate :)

        kevin

         
    • Anonymous

      Anonymous - 2002-03-05

      Can I ask a stupid question ? No dont answer that, I'm going to ask it anyway.

      I to have a set of voice recordings and have created a transcript file from them. The stupid question is, do I need to force-align these files before I use SphinxTrain ?

      I ask this question, because the CI training is throwing up so many errors and would force-alignment help.

       
    • Joe Beauchamp

      Joe Beauchamp - 2002-06-13

      Well, you need a model to force-align.  Then, according to the doc's, you should iteratively force-align.  I'm not sure that force-alignment helps consistently.  I've had it get worse -- seems like you just take your chances and cross your fingers.

      So, the first time, there is no model to use so you can't force-align...

       

Log in to post a comment.