Menu

Acoustic scores in sphinx3 aligner

Help
2011-11-04
2012-09-22
  • Anuj Tewari

    Anuj Tewari - 2011-11-04

    For the sphinx 3 aligneroutput, I was wondering why sometimes the acoustic
    scores for some phones, is a positive number. If these are log likelihood
    probabilities to the logbase 1.0001, shouldn't they be all negative numbers?
    Is there scaling happening? If yes, then is there a way I can obtain real
    scores?

    SFrm EFrm SegAScr Phone
    0 2 -54898 SIL
    3 5 -219021 SIL
    6 12 -307350 M SIL IY b
    13 32 131837 IY M SIL e
    33 44 345816 SIL
    45 68 176492 SIL
    69 117 126858 SIL
    Total score: 199734

     
  • Nickolay V. Shmyrev

    Acoustic scores are densities, not probabilities. They are not necessary less
    that 1.

    Sphinx3 aligner output is unscaled.

     
  • Anuj Tewari

    Anuj Tewari - 2011-11-04

    Thanks! Is there a way to obtain likelihood probabilities for phones in a
    word, using the aligner?

     
  • Anuj Tewari

    Anuj Tewari - 2011-11-04

    I am trying to see if I can rate the phonetic breakup of the pronunciation of
    a word, using the aligner. For example, if the user says PEAK (P IY K), I
    would want to determine the quality of the individual phones (context-
    dependent) and then give feedback on pronunciation.

     
  • Nickolay V. Shmyrev

    No, aligner doesn't print that. Aligner is for alignment, not for the phone
    evaluation.

     
  • Anuj Tewari

    Anuj Tewari - 2011-11-04

    I see. Thanks again! Is there documentation on exactly what the aligner scores
    represent then? I could find resources for the decoder, but it is not clear
    what the aligner output means. Any pointers would be nice.

     
  • Anuj Tewari

    Anuj Tewari - 2011-11-07

    Any help on this would be appreciated. Is there a detailed description of
    acoustic scores for Sphinx3, somewhere?

     
  • Pranav Jawale

    Pranav Jawale - 2012-03-20

    Hello nshymrev,

    Sphinx3 aligner output is unscaled.
    

    I'm upscaling the sphinx3_align scores using the .bsenscr files produced by
    sphinx3_decode. (using word segmentation info in
    .wdseg, add the
    corresponding scores from *.bsenscr).

    1. Is this procedure correct?

    2. If it's correct, why would the same word's score as given by sphinx3_decode be different than that obtained by above method? Internally sphinx3_decode does this upscaling by itself and gives the score.

    I'm getting different scores even wheh the word boundaries in sphinx3_decode
    and sphinx3_align are exactly the same!

    Could it be because phone segmentation assumed by sphinx3_decode is different
    than that assumed by sphinx3_align?

    Thanks.

     
  • Nickolay V. Shmyrev

    Hi Pranav

    On your place I would disable scaling in s3 altogether in the sources and go
    sleep in a good mood.

     

Log in to post a comment.