CMU Sphinx / Forums / Help: Pocketsphinx method for keyword spotting

Susanne Trick - 2018-06-26

Hi,
I would like to use Pocketsphinx for Keyword Spotting for a Human-Robot-Interaction as part of my master thesis. After doing some research on related work and different methods that can be applied for keyword spotting I am very interested in the method Pocketsphinx uses for their KWS function. I could only find a short paragraph in one paper that told that Pocketsphinx first uses LVCSR and afterwards does a text-based search for the keywords. Is that correct? Could anyone tell me some more details about the method maybe? Or is there some official reference that explains how Pocketsphinx’s keyword spotting works?
I would also be interested in how the confidences that one can get for the keywords are computed.
I hope, anyone knows something about it!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-06-26
  
  short paragraph in one paper that told that Pocketsphinx first uses LVCSR and afterwards does a text-based search for the keywords
  
  No, pocketsphinx keyword spotting does not use lvcsr.
  
  Could anyone tell me some more details about the method maybe?
  
  Pocketsphinx uses HMM keyword spotting or acoustic keyword spotting, the original citation should probably be
  
  A hidden Markov model based keyword recognition system by Rose and Paul 1992
  https://sci-hub.tw/https://ieeexplore.ieee.org/document/115555/
  
  You can probably find a more compact description at
  Comparison of Keyword Spotting Approaches for Informal Continuous Speech⋆
  Igor Szoke, Petr Schwarz, Pavel Matejka, Lukas Burget, Michal Fapso, Martin Karafiat, Jan Cernocky ́
  http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.544.3130
  
  I would also be interested in how the confidences that one can get for the keywords are computed.
  
  Confidence in HMM keyword spotting is a difference between word path score and garbage path score.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Susanne Trick - 2018-06-29

Thank you very much, Nickolay!
Is the confidence difference you mentioned the cumulative log likelihood ratio?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-06-29
  
  Exactly!
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Susanne Trick - 2018-07-02

Thank you!
Another question concerning the reference you mentioned (Rose & Paul, 1992): They mention a background model additionally to the keyword and filler models. I think the filler models are equivalent to the garbage models you mentioned. You said that the confidence is the likelihood ratio between the keyword path score and the garbage path score. But in Rose & Paul they compute the likelihood ratio between the keyword and filler model and the background model (e.g. see Figure 3). This is also mentioned in Szöke, 2005, which was the 2nd reference you mentioned (see Figure 1). I could find a good explanation of this background model in chapter 3.4.3 of this reference: https://pdfs.semanticscholar.org/a6e1/5bdd38110a0e650c3465c7e8fbb48e3cbd12.pdf.
According to this work, the background model serves as an additional check whether a keyword that has a higher score than the filler models is really a keyword and for this the likelihood ration score is used.
Now I am a bit confused, because you said, that likelihood ration is computed between keyword and filler models and the references say it is computed between keyword and background model. What am I missing here? And does Pocketsphinx also use this background model?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-07-03
  
  between keyword and filler models and the references say it is computed between keyword and background mode
  
  Garbage model is the same as filler model and the same as background model in Paul, a model of alternative decoding. This is what he writes:
  
  In order to account for these variabilities, a parallel “background” network of filler models is included as shown in Figure 3.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Susanne Trick - 2018-07-07

After re-reading it, I got it now. Thank you!
Chapter 2 of this paper also helped a lot (if anyone else is confused like I was): http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.551.3676&rep=rep1&type=pdf

Another question to you Nickolay, it's about the confidence values I asked you about already: Is there the possibility to get the confidences for all keywords in the keyword list for an utterance? I'm thinking about a program that returns the probabilities (or confidences) that a keyword was uttered, but for all prespecified keywords (like 'hello' - 0.8, 'house' - 0.1, 'yes' - 0.1, if these are the 3 keywords in my keyword list). I hope, there is a way! Right now, I just get the confidence for each spotted keyword that is part of the hypothesis.
(Maybe this question should be in a new thread? I am not that experienced with these kind of chats.)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-07-07
  
  If you set thresholds large enough you should get confidence scores for all the keyphrases in the list.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Susanne Trick - 2018-07-09

Last edit: Susanne Trick 2019-05-22

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Susanne Trick - 2018-07-19

Last edit: Susanne Trick 2019-05-22

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Pocketsphinx method for keyword spotting

Speech Recognition Toolkit

Forums

Help

Pocketsphinx method for keyword spotting document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Pocketsphinx method for keyword spotting