Hi, I used to study the code of KWS in pocketsphinx.
I found the function "kws_search_trans" which, according to my understanding, does inter-HMM viterbi that transfers the out score and out history of last state in previous phone to in score and in history of first state in current phone, based on viterbi mechanism.
However in "kws_search_trans" it seems like only three types of inter-HMM transitions are considered:
1. phone loop to phone loop
2. phone loop to keyphrase
3. intra-keyphrase
And I didn't see transitions coming from keyphrase, i.e. keyphrase to keyphrase and keyphrase to phone loop, which in my opinion should occur when the keyphrase is spotted.
Can you explain this for me, Nickolay?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
which in my opinion should occur when the keyphrase is spotted.
No, they aren't necessary at least in our approach. We do not care about anything that happened after the word, we just consider phone loop vs word scores and make a decision based on that. We do not consider the case of multiple keyphrases either.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your reply, but I don't quite understand.
The pocketsphinx uses path score to determine whether a keyphrase is spotted or not frame by frame, so it's more like a decoding process used in KWS purpose.
A spotted keyphrase means its score is larger than any other phone loop and should go forward next state. Why is it reasonable to discard keyphrase path and pass phone loop forward in replace?
Furthermore, can you explain the differnce between remaining keyphrase path and not?
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The pocketsphinx uses path score to determine whether a keyphrase is spotted or not frame by frame
Yes.
and should go forward next state
No, not necessary. We compare exit scores across several frames to get max, and we do not go to the next state. It is kind of a non-standard approach, but I think it is good enough.
Furthermore, can you explain the differnce between remaining keyphrase path and not?
I do not get this question.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We didn't do any detailed experiment on that, we didn't have much resources for that. I chose the second approach because it seemed to me much simple in the code than the first one.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I used to study the code of KWS in pocketsphinx.
I found the function "kws_search_trans" which, according to my understanding, does inter-HMM viterbi that transfers the out score and out history of last state in previous phone to in score and in history of first state in current phone, based on viterbi mechanism.
However in "kws_search_trans" it seems like only three types of inter-HMM transitions are considered:
1. phone loop to phone loop
2. phone loop to keyphrase
3. intra-keyphrase
And I didn't see transitions coming from keyphrase, i.e. keyphrase to keyphrase and keyphrase to phone loop, which in my opinion should occur when the keyphrase is spotted.
Can you explain this for me, Nickolay?
No, they aren't necessary at least in our approach. We do not care about anything that happened after the word, we just consider phone loop vs word scores and make a decision based on that. We do not consider the case of multiple keyphrases either.
Thanks for your reply, but I don't quite understand.
The pocketsphinx uses path score to determine whether a keyphrase is spotted or not frame by frame, so it's more like a decoding process used in KWS purpose.
A spotted keyphrase means its score is larger than any other phone loop and should go forward next state. Why is it reasonable to discard keyphrase path and pass phone loop forward in replace?
Furthermore, can you explain the differnce between remaining keyphrase path and not?
Thanks.
Yes.
No, not necessary. We compare exit scores across several frames to get max, and we do not go to the next state. It is kind of a non-standard approach, but I think it is good enough.
I do not get this question.
Okay I got it.
My last question means what's the difference between:
1. Let the spotted keyphrase go forward next state
2. Do not go to next state
I think the reason of applying second approach may caused by some expermental results?
We didn't do any detailed experiment on that, we didn't have much resources for that. I chose the second approach because it seemed to me much simple in the code than the first one.