Menu

Pocketsphinx method for keyword spotting

Help
2018-06-26
2018-07-19
  • Susanne Trick

    Susanne Trick - 2018-06-26

    Hi,
    I would like to use Pocketsphinx for Keyword Spotting for a Human-Robot-Interaction as part of my master thesis. After doing some research on related work and different methods that can be applied for keyword spotting I am very interested in the method Pocketsphinx uses for their KWS function. I could only find a short paragraph in one paper that told that Pocketsphinx first uses LVCSR and afterwards does a text-based search for the keywords. Is that correct? Could anyone tell me some more details about the method maybe? Or is there some official reference that explains how Pocketsphinx’s keyword spotting works?
    I would also be interested in how the confidences that one can get for the keywords are computed.
    I hope, anyone knows something about it!

     
    • Nickolay V. Shmyrev

      short paragraph in one paper that told that Pocketsphinx first uses LVCSR and afterwards does a text-based search for the keywords

      No, pocketsphinx keyword spotting does not use lvcsr.

      Could anyone tell me some more details about the method maybe?

      Pocketsphinx uses HMM keyword spotting or acoustic keyword spotting, the original citation should probably be

      A hidden Markov model based keyword recognition system by Rose and Paul 1992
      https://sci-hub.tw/https://ieeexplore.ieee.org/document/115555/

      You can probably find a more compact description at
      Comparison of Keyword Spotting Approaches for Informal Continuous Speech⋆
      Igor Szoke, Petr Schwarz, Pavel Matejka, Lukas Burget, Michal Fapso, Martin Karafiat, Jan Cernocky ́
      http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.544.3130

      I would also be interested in how the confidences that one can get for the keywords are computed.

      Confidence in HMM keyword spotting is a difference between word path score and garbage path score.

       
  • Susanne Trick

    Susanne Trick - 2018-06-29

    Thank you very much, Nickolay!
    Is the confidence difference you mentioned the cumulative log likelihood ratio?

     
    • Nickolay V. Shmyrev

      Exactly!

       
  • Susanne Trick

    Susanne Trick - 2018-07-02

    Thank you!
    Another question concerning the reference you mentioned (Rose & Paul, 1992): They mention a background model additionally to the keyword and filler models. I think the filler models are equivalent to the garbage models you mentioned. You said that the confidence is the likelihood ratio between the keyword path score and the garbage path score. But in Rose & Paul they compute the likelihood ratio between the keyword and filler model and the background model (e.g. see Figure 3). This is also mentioned in Szöke, 2005, which was the 2nd reference you mentioned (see Figure 1). I could find a good explanation of this background model in chapter 3.4.3 of this reference: https://pdfs.semanticscholar.org/a6e1/5bdd38110a0e650c3465c7e8fbb48e3cbd12.pdf.
    According to this work, the background model serves as an additional check whether a keyword that has a higher score than the filler models is really a keyword and for this the likelihood ration score is used.
    Now I am a bit confused, because you said, that likelihood ration is computed between keyword and filler models and the references say it is computed between keyword and background model. What am I missing here? And does Pocketsphinx also use this background model?

     
    • Nickolay V. Shmyrev

      between keyword and filler models and the references say it is computed between keyword and background mode

      Garbage model is the same as filler model and the same as background model in Paul, a model of alternative decoding. This is what he writes:

      In order to account for these variabilities, a parallel “background” network of filler models is included as shown in Figure 3.

       
  • Susanne Trick

    Susanne Trick - 2018-07-07

    After re-reading it, I got it now. Thank you!
    Chapter 2 of this paper also helped a lot (if anyone else is confused like I was): http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.551.3676&rep=rep1&type=pdf

    Another question to you Nickolay, it's about the confidence values I asked you about already: Is there the possibility to get the confidences for all keywords in the keyword list for an utterance? I'm thinking about a program that returns the probabilities (or confidences) that a keyword was uttered, but for all prespecified keywords (like 'hello' - 0.8, 'house' - 0.1, 'yes' - 0.1, if these are the 3 keywords in my keyword list). I hope, there is a way! Right now, I just get the confidence for each spotted keyword that is part of the hypothesis.
    (Maybe this question should be in a new thread? I am not that experienced with these kind of chats.)

     
    • Nickolay V. Shmyrev

      If you set thresholds large enough you should get confidence scores for all the keyphrases in the list.

       
  • Susanne Trick

    Susanne Trick - 2018-07-09
     

    Last edit: Susanne Trick 2019-05-22
  • Susanne Trick

    Susanne Trick - 2018-07-19
     

    Last edit: Susanne Trick 2019-05-22

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.