Menu

Acoustic score

Jigar
2013-04-05
2013-04-09
  • Jigar

    Jigar - 2013-04-05

    Can someone please post the formula for computing the acoustic score?
    The score which we get in the output of a forced aligned file as below:

         SFrm  EFrm   SegAScr Phone
            0     2   -125782 SIL
            3    10    -37688 a SIL k b
           11    17    -89629 k a o i
           18    27    -34527 o k l i
           28    36    -41078 l o aa i
           37    50    -26881 aa l SIL e
           51    53    -61455 SIL
    

    Total score: -417040

     
  • Jigar

    Jigar - 2013-04-06

    Adding to the above question how is variance flooring done when aligning?

     
  • dovark

    dovark - 2013-04-07

    Hello Jigar,

    Create separate topic for different questions. As for your first question, what did you do to get that output? Which command did you run?

    I am asking because there could be normalization by best senone in a frame, depending on the commandline parameters.

     
  • Jigar

    Jigar - 2013-04-08
     

    Last edit: Jigar 2013-04-08
  • Jigar

    Jigar - 2013-04-08

    I ran the following command:

    sphinx3_align
    -hmm forceAlignsphinx3_noCMN_s1000_g16.cd_cont_1000/
    -dict marathiAgmark1500.dic
    -fdict 850spkr.filler
    -ctl docs/fileid.txt
    -insent phone.insent
    -cepdir features/
    -phsegdir phonesegdir/
    -phlabdir phonelabdir/
    -stsegdir statesegdir/
    -wdsegdir aligndir/
    -outsent phone.outsent
    -cmn none
    -unit_area no
    -round_filters no

     
  • dovark

    dovark - 2013-04-09

    Ok. I think in sphinx3_align, all the scores are normalized by best senone score in each frame.

    As you might have noticed, the state level scores add up to give phone level scores, and consequently phone level scores add up to give word level scores.

    As per my understanding, state level score, in each frame comes from GMM likelihood and transition matrix probability.

    stateScore = log(GMM probability) + log(transition matrix probability) - score_of_best_senone

    Here log is the -logbase parameter (default 1.0003)

    Formula for GMM probability can be found in any speech textbook.

     

    Last edit: dovark 2013-04-09
  • Jigar

    Jigar - 2013-04-09

    I am writing code for force alignment.
    I computed the forward probabilities which is basically addition of log(a) + log(alpha(t-1) ) + log(b) and using Viterbi algorithm to get the state segmentation. However there are some problems wrt alignment.
    I observed that while computing log-likelihood (log(b)), there has to be some scaling done wrt the features as mentioned in
    http://www.speech.cs.cmu.edu/sphinxman/FAQ.html#18

    Can you please elaborate this procedure?

     
  • dovark

    dovark - 2013-04-09

    I'm not sure that "Hypothesis Combination" is relevant to your problem. It is a post-decoding stage, when you want to combine more than one possible hypotheses.

    Perhaps other can tell more about where and what does that hypothesis-combination code do.

     

Log in to post a comment.