Regd. traceback based n-best algorithm

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Regd. traceback based n-best algorithm

Forum: Speech Recognition Theory

Creator: dovark

Created: 2013-03-19

Updated: 2013-03-20

dovark - 2013-03-19

Hi all,

I was reading section related to NBest list in "Spoken Language Processing" 1st ed by Huang et al. I have a question about Fig 13.11, "Deficiency in traceback-based n-best algorithm". Can someone please tell me - What is Ph1, ph2, ph3 on the Y axis? Do the path lines represent score magnitudes?

As I understand, their argument is if a word Wk appears in two best paths with the same end times, one of those paths will get pruned out. What is the significance of these two Wks having different start times? What difference does it make whether they have same starting time or not?

[I don't have access to the springer book chapter [40] referenced there, hence asking here]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-03-19

Can someone please tell me - What is Ph1, ph2, ph3 on the Y axis?

This figure displays alignment between HMM states and frames. On the Y axis you have states, so Ph1 is a sequence of states corresponding to the first word w_i with start on the top, Ph2 is the sequence of states corresponding to path w_j and Ph3 corresponds to states of word w_k.

Do the path lines represent score magnitudes?

No, scores are not displayed here, they are just described in text.

What is the significance of these two Wks having different start times? What difference does it make whether they have same starting time or not?

If both first path Ph1 and second path Ph2 transition to word Wk in the same frame(time) we would just record them both in traceback table. We only store traceback when we transition to a new word, that's the issue.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

dovark - 2013-03-19

Thanks for explanation! There is one more thing though - quoting from the book -

"The algorithm runs just like the normal time-synchronous Viterbi algorithm for all within-word transitions. However for each time frame "t", and each word-ending state, the algorithm stores all the different words that can end at current time "t" and their corresponding scores in a traceback list."

Based on this I have drawn a HMM state Vs time figure showing a possible situation. Please find it here http://goo.gl/CRa0V

In this figure, each word has only 2 states. Broken line represents the path which gets deleted due to lower score than the other competing path.

On the RHS, I have written traceback list entries. It seems right according to the quoted line above, but it is in contradiction with conclusion in prev. discussion (i.e. there can't be two paths for word_K).

Please let me know where am I going wrong.

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-03-19

Based on this I have drawn a HMM state Vs time figure showing a possible situation. Please find it here http://goo.gl/CRa0V

On your picture you need t = 5 when path through word J will reach the final state. Otherwise your picture is not complete.

On the RHS, I have written traceback list entries. It seems right according to the quoted line above,

Once you will have t = 5 and start to trace back from the final state you will only be able to recover path through the word I, not the path through the word J.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

dovark - 2013-03-20

Thanks. I see it now! Just to summarize, I have uploaded updated image here http://goo.gl/c3bgP

Out of the 2 pink paths only one is preserved in the final nbest list.

In order to remove this drawback they introduced word-dependent nbest list approach. Here there are no "broken paths". Even if a state has multiple incoming paths, all of them (< k) are preserved hence finally we get two nbest paths instead of one. Although this will mean more storage requirement.

Authors say about this modified word-dependent algo that "However, it is no longer admissible because of the word-dependent approximation".

What do they mean by "no longer admissible"? I don't see any loss of information as compared to previous approach (i.e. traceback nbest). Only thing we are sacrificing is memory (during decoding).

Last edit: dovark 2013-03-20

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-03-20

Thanks. I see it now! Just to summarize, I have uploaded updated image here http://goo.gl/c3bgP

Sorry, this document is not open for everybody.

What do they mean by "no longer admissible"?

Since this search is an approximation, it's still possible to find out the case where second-best path will be pruned. For that you need to consider word sequencies longer than k words. However, for such a long sequencies the chances that second-best path will be pruned are very small.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

dovark - 2013-03-20

Sorry, here's the link which is shared with all https://docs.google.com/file/d/0BzNdoGke8tyRQVVmQlNoTFJ3d1U/edit?usp=sharing

Ok. I got this. Thanks again.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.