CMU Sphinx / Forums / Help: viterbi implementation of pocketsphinx

SSt - 2016-05-03

Hi,
i was just reading through the source file of the viterbi (hmm.c in libpocketsphinx) and was a little confused when i got to the implementation for 3 state HMMs in "hmm_vit_eval_3st_lr":

If I am correct in interpreting the code, the score update for a state "i" is done by taking the maximum over the predecessors ( which in our case are "i" itself and "i-1" ) of the sum of the predecessor score + transition probability + senon score of predecessor.

But shouldn't the viterbi actually only take the maximum over the predecessor score + transition probability and then add the senon score for our current state "i" instead of taking it into the maximization?

What am I missing here? Would be nice to get some clarification here.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-05-06
  
  From the code
  
  s0 = hmm_in_score(hmm) + nonmpx_senscr(0); s0 = s0 + hmm_tprob_3st(0, 0); ..... if (s0 WORSE_THAN WORST_SCORE) s0 = WORST_SCORE; hmm_in_score(hmm) = s0;
  
  it is a proper sum of transition score + acoustic score for the current state. I'm not sure where did you take idea about "predecssor state".
  
  I think you can explain more if this thing is still unclear.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SSt - 2016-05-10

well, my point is this:
lets consider the update for s2 and take the case where we cannot skip states (i.e. t2 is never better than t0 or t1)

then in the beginning the following helper variables are created as sums of the score + senon score

:::c++ s2 = hmm_score(hmm, 2) + nonmpx_senscr(2); s1 = hmm_score(hmm, 1) + nonmpx_senscr(1);

then we come to the update phase:

:::c++ /* All transitions into state 2 (state 0 is always active) */ t0 = s2 + hmm_tprob_3st(2, 2); t1 = s1 + hmm_tprob_3st(1, 2); if (hmm_tprob_3st(0, 2) BETTER_THAN TMAT_WORST_SCORE) t2 = s0 + hmm_tprob_3st(0, 2); if (t0 BETTER_THAN t1) { if (t2 BETTER_THAN t0) { s2 = t2; hmm_history(hmm, 2) = hmm_in_history(hmm); } else s2 = t0; } else { if (t2 BETTER_THAN t1) { s2 = t2; hmm_history(hmm, 2) = hmm_in_history(hmm); } else { s2 = t1; hmm_history(hmm, 2) = hmm_history(hmm, 1); } } if (s2 WORSE_THAN WORST_SCORE) s2 = WORST_SCORE; if (s2 BETTER_THAN bestScore) bestScore = s2; hmm_score(hmm, 2) = s2;

now, we see that the hmm_score for state 2 gets assigned the value s2, which in turn was created as the maximum of t0 and t1.
These values were in turn calculated from the helper variables and their added transition probability thus giving the following formula for the update (by putting in the definitions of the code)

hmm_score(2) = s2
= max[ to ; t1 ]
= max[ s2 + tprob(2,2) ; s1 + tprob(1,2) ]
= max[ hmm_score(2) + senscore(2) + tprob(2,2) ; hmm_score(1) + senscore(1) + tprob(1,2) ].
And this is the point that i meant: the senon scores are taken into the maximization process which is not the case in the true viterbi update formula.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-05-10
  
  the senon scores are taken into the maximization process which
  is not the case in the true viterbi update formula.
  
  This is how it should be. I am not sure what do you mean by "viterbi update formula". Here https://en.wikipedia.org/wiki/Viterbi_algorithm you can see in the formula for V_{t,k} there is a term P(y_t | k) which is a score of acoustic observation or senone score.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

SSt - 2016-05-11

exactly, and you can also see that this term P(y_t | k) does not depend on the previous state x in the formula over which the maximum is obtained. it only depends on the state k.

but in the formula in the code we use different senone scores that depend on the previous state x, namely senscore(1) and senscore(2).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-05-11
  
  Thanks, I see now
  
  It is not quite clear why it was implemented this way. Overall, there could be many modifications, you could have distributions on arcs, on outgoing states or on incoming states. The exact choice is not well justified.
  
  I asked the same question on the mailing list
  
  https://sourceforge.net/p/cmusphinx/mailman/message/35080544/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2016-05-11
    
    Ok, I understood the reasons it seems.
    
    Basically in speech recognition you have implementation issues since you have many many hmms which you need to connect in different ways and it is more efficient to implement HMM distributions assigned to graph edges, not graph nodes. And in nodes you store the tokens. This transform is equivalent to standard HMM, but it allows to disconnect distributions from nodes and thus you can glue HMMs on nodes without any recomputations on scores. This is what happens in pocketsphinx, you have 4 nodes and 3 arcs between them. Last node is called non-emitting and is useful for transition between phones. And you can simply bypass non-emitting outgoing score to incoming score of the next HMM.
    
    With distribution assigned to every state like in conventional HMM model Viterbi step between phones would require recalculations.
    
    You have the same thing in WFST framework, there you also evaluate distributions on arcs of WFST, not on nodes. And in nodes you keep the score and perform maximization.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2016-05-11
      
      Ok, to put it more simple, we need non-emitting states to connect models together. But because of non-emitting is last state we can't use "target state" acoustic score scheme since non-emitting is also a target and we have to use "source state" acoustic score scheme. This is not the HMM as described in basic tutorials, but it works.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

viterbi implementation of pocketsphinx

Speech Recognition Toolkit

Forums

Help

viterbi implementation of pocketsphinx document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

viterbi implementation of pocketsphinx