Menu

viterbi implementation of pocketsphinx

Help
SSt
2016-05-03
2016-05-11
  • SSt

    SSt - 2016-05-03

    Hi,
    i was just reading through the source file of the viterbi (hmm.c in libpocketsphinx) and was a little confused when i got to the implementation for 3 state HMMs in "hmm_vit_eval_3st_lr":

    If I am correct in interpreting the code, the score update for a state "i" is done by taking the maximum over the predecessors ( which in our case are "i" itself and "i-1" ) of the sum of the predecessor score + transition probability + senon score of predecessor.

    But shouldn't the viterbi actually only take the maximum over the predecessor score + transition probability and then add the senon score for our current state "i" instead of taking it into the maximization?

    What am I missing here? Would be nice to get some clarification here.

     
    • Nickolay V. Shmyrev

      From the code

          s0 = hmm_in_score(hmm) + nonmpx_senscr(0);
          s0 = s0 + hmm_tprob_3st(0, 0);
         .....
          if (s0 WORSE_THAN WORST_SCORE) s0 = WORST_SCORE;
          hmm_in_score(hmm) = s0;
      

      it is a proper sum of transition score + acoustic score for the current state. I'm not sure where did you take idea about "predecssor state".

      I think you can explain more if this thing is still unclear.

       
  • SSt

    SSt - 2016-05-10

    well, my point is this:
    lets consider the update for s2 and take the case where we cannot skip states (i.e. t2 is never better than t0 or t1)

    then in the beginning the following helper variables are created as sums of the score + senon score

    :::c++
        s2 = hmm_score(hmm, 2) + nonmpx_senscr(2);
        s1 = hmm_score(hmm, 1) + nonmpx_senscr(1);
    

    then we come to the update phase:

    :::c++
        /* All transitions into state 2 (state 0 is always active) */
        t0 = s2 + hmm_tprob_3st(2, 2);
        t1 = s1 + hmm_tprob_3st(1, 2);
        if (hmm_tprob_3st(0, 2) BETTER_THAN TMAT_WORST_SCORE)
            t2 = s0 + hmm_tprob_3st(0, 2);
        if (t0 BETTER_THAN t1) {
            if (t2 BETTER_THAN t0) {
                s2 = t2;
                hmm_history(hmm, 2)  = hmm_in_history(hmm);
            } else
                s2 = t0;
        } else {
            if (t2 BETTER_THAN t1) {
                s2 = t2;
                hmm_history(hmm, 2)  = hmm_in_history(hmm);
            } else {
                s2 = t1;
                hmm_history(hmm, 2)  = hmm_history(hmm, 1);
            }
        }
        if (s2 WORSE_THAN WORST_SCORE) s2 = WORST_SCORE;
        if (s2 BETTER_THAN bestScore) bestScore = s2;
        hmm_score(hmm, 2) = s2;
    

    now, we see that the hmm_score for state 2 gets assigned the value s2, which in turn was created as the maximum of t0 and t1.
    These values were in turn calculated from the helper variables and their added transition probability thus giving the following formula for the update (by putting in the definitions of the code)

    hmm_score(2) = s2
    = max[ to ; t1 ]
    = max[ s2 + tprob(2,2) ; s1 + tprob(1,2) ]
    = max[ hmm_score(2) + senscore(2) + tprob(2,2) ; hmm_score(1) + senscore(1) + tprob(1,2) ].
    And this is the point that i meant: the senon scores are taken into the maximization process which is not the case in the true viterbi update formula.

     
    • Nickolay V. Shmyrev

      the senon scores are taken into the maximization process which
      is not the case in the true viterbi update formula.

      This is how it should be. I am not sure what do you mean by "viterbi update formula". Here https://en.wikipedia.org/wiki/Viterbi_algorithm you can see in the formula for V_{t,k} there is a term P(y_t | k) which is a score of acoustic observation or senone score.

       
  • SSt

    SSt - 2016-05-11

    exactly, and you can also see that this term P(y_t | k) does not depend on the previous state x in the formula over which the maximum is obtained. it only depends on the state k.

    but in the formula in the code we use different senone scores that depend on the previous state x, namely senscore(1) and senscore(2).

     
    • Nickolay V. Shmyrev

      Thanks, I see now

      It is not quite clear why it was implemented this way. Overall, there could be many modifications, you could have distributions on arcs, on outgoing states or on incoming states. The exact choice is not well justified.

      I asked the same question on the mailing list

      https://sourceforge.net/p/cmusphinx/mailman/message/35080544/

       
      • Nickolay V. Shmyrev

        Ok, I understood the reasons it seems.

        Basically in speech recognition you have implementation issues since you have many many hmms which you need to connect in different ways and it is more efficient to implement HMM distributions assigned to graph edges, not graph nodes. And in nodes you store the tokens. This transform is equivalent to standard HMM, but it allows to disconnect distributions from nodes and thus you can glue HMMs on nodes without any recomputations on scores. This is what happens in pocketsphinx, you have 4 nodes and 3 arcs between them. Last node is called non-emitting and is useful for transition between phones. And you can simply bypass non-emitting outgoing score to incoming score of the next HMM.

        With distribution assigned to every state like in conventional HMM model Viterbi step between phones would require recalculations.

        You have the same thing in WFST framework, there you also evaluate distributions on arcs of WFST, not on nodes. And in nodes you keep the score and perform maximization.

         
        • Nickolay V. Shmyrev

          Ok, to put it more simple, we need non-emitting states to connect models together. But because of non-emitting is last state we can't use "target state" acoustic score scheme since non-emitting is also a target and we have to use "source state" acoustic score scheme. This is not the HMM as described in basic tutorials, but it works.

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.