CMU Sphinx / Forums / Help: Multiple Keyword spotting

Peetambar Singh - 2014-05-12

I have some audio clips obtained by decompressing and splitting an audio-video stream.
These audio clips can have many variations in accent, language, ambient sound etc.

I need to spot multiple keywords from these clips and also wish to locate the time where the keyword has been uttered.

Can it be accomplished using pocketsphinx?

Regards
PK-Singh

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-05-12
  
  Yes, you can do it with pocketsphinx. There is a kws search mode, pocketsphinx -infile file.wav -kws keyphrase.file. Keyphrase file contains key phrases one per line.
  
  To access this API you need to checkout pocketsphinx from subversion.
  
  Last edit: Nickolay V. Shmyrev 2014-05-12
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Ali Hassan - 2014-05-12
    
    Hi,
    thank you so much for your quick response. I'm using pocket pocketsphinx under Windows 7, I tried above mentioned command but it didn't work. Here are the the parameters I set, please let me know what is it that I'm doing wrong.
    
    Command: pocketsphinx_continuous.exe -argfile argFile.txt -infile .\test_files\my_audio.wav
    
    argFile:
    
    -dict ./model/lm/cmu07a.dic
    -hmm ./model/hmm/en_broadcastnews_16k_ptm256_5000
    -lm ./model/lm/en-70k.dmp
    -kws ./model/keyphrase.file
    -kws_threshold 0.95
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - AnatolyB - 2014-05-13
    
    | To access this API you need to checkout pocketsphinx from subversion.
    
    Could you please specify the version with kws which should be checked out? I tried the latest from the trunk but it appears to be too sensitive to all the background noise (the earlier version did not have such problem)
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2014-05-13
      
      Could you please specify the version with kws which should be checked out?
      
      Latest one
      
      but it appears to be too sensitive to all the background noise (the earlier version did not have such problem)
      
      Provide the information to reproduce your problem - audio file, keyword and so on.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - AnatolyB - 2014-05-14
        
        I actually tried it in continious mode first and compared the latest version with 0.8. With the same background the older version reacted only to rather loud speech, while the newer responded to almost each minor sound, which is every couple of seconds. Is there any way to control the sensitivity in the latest version?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Nickolay V. Shmyrev - 2014-05-14
        
        I'm not sure what do you mean by that, this thread is about keyword spotting and keyword activation. It is not available in 0.8.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        AnatolyB - 2014-05-28
        
        Sorry, I'll try to describe my problem more properly now.
        I'm testing a latest version of pocketsphinx from the trunk, with common Russian language model (voxforge-ru-0.2). I have a sample audio file with a child speaking "как тебя зовут?". The common processing in the continuous mode gives me "как тебя золотой". But when I switch to keyword search and try to find any part of this phrase, I always get 'null' answer (with any kws threshold). The audio file and keyfile are attached.
        
        file1.wav
        
        keyfile.txt
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        bic-user - 2014-05-28
        
        I succeed to detect keywords in your audio file with -kws_threshold 1e-50
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Nickolay V. Shmyrev - 2014-05-28
        
        Right, voxforge-ru is not very accurate model and it's not expected to be accurate for children voices. Another factor of accuracy is parameter estimation. Your utterance is very short and you are trying to detect keyword from the beginning. There is no enough time for recognizer to estimate the channel parameters. If you make the utterance longer or just add -cmninit 12.85,0,0.07,-0.27,-0.30,0.07,-0.37,-0.13,-0.16,-0.16,-0.25,-0.10,-0.20 to command line arguments, you can lower threshold to 1e-25. And it will recognize "как тебя зовут" with lm properly.
        
        Last edit: Nickolay V. Shmyrev 2014-05-28
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peetambar Singh - 2014-05-12

This is good news, but I am very new to ASR and have some doubts

Will the default acoustic model be able to handle different accents and different noise conditions

My application needs to spot words in English being spoken in mixed language.
For example a person is speaking something in Arabic but in between speaks some keywords/ phrases in English. Can this also be handled with the existing acoustic model.

For the above scenario, shall I have to use Language Model or the goal can be achieved by a FSG also.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-05-12
  
  Will the default acoustic model be able to handle different accents and different noise conditions
  
  Yes
  
  My application needs to spot words in English being spoken in mixed language. For example a person is speaking something in Arabic but in between speaks some keywords/ phrases in English. Can this also be handled with the existing acoustic model.
  
  Yes, but it's better to train a new model
  
  For the above scenario, shall I have to use Language Model or the goal can be achieved by a FSG also.
  
  You need neither LM nor FSG, it's a third search mode - keyword spotting. It looks for certain words from a list.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peetambar Singh - 2014-05-12

Is it possible to time stamp the occurrence of the keyword in an utterance

I have seen some discussions of keyword spotting based on some grammar constructs. Is this third search mode better in terms of accuracy (false acceptance/rejection) etc.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peetambar Singh - 2014-05-13

I have given the following command line for keyword spotting, audio input is a microphone.
I have observed a lot of false rejections. There is only word in the keyphrase file
OSCAR and its pronunciation in the dictionary file is
OSCAR AO S K ER

pocketsphinx_continuous.exe -hmm hub4wsj_sc_8k -kws keyphrase.file -dict keyphrase.dic -samprate 16000.

Do I need to tune some other parameters. What is the significance of kws_threshold parameter.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2014-05-13
  
  I have observed a lot of false rejections. There is only word in the keyphrase file
  
  Provide the data to reproduce your problem.
  
  What is the significance of kws_threshold parameter.
  
  threshold controls false alarm rate, however, it's better to provide the data first, there might be other issues.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peetambar Singh - 2014-05-14

Hi Nicole,

Following is the link for the database on which I am trying out kws_search.

I have used the following command line
pocketsphinx_continuous -hmm hub4wsj_sc_8k -samprate 16000 -kws keyphrase.file -dict kws.dic -infile oscar.wav (similarly for other wav files).

Is the above command line correct ????

https://www.dropbox.com/s/orn9fa2v65lz008/debug_kws.zip

keyphrase.file contains the following keyphrases -- OSCAR, CHARLIE, FOXTROT, DELTA, DECEMBER.
kws.dic is the dictionary file. I am using the hub4wsj_sc_8k model as available with the pocketsphinx release. I am using the latest svn version of pocketsphinx.

It contains 7 audio files
oscar.wav -- 5/6 utterances of OSCAR,
delta.wav, -- 5/6 utterances of DELTA
december.wav, -- 5/6 utterances of DECEMBER
charlie.wav, -- 5/6 utterances of CHARLIE
foxtrot.wav, -- 5/6 utterances of FOXTROT
oscar_december_charlie.wav -- a mix of OSCAR, DECEMBER and CHARLIE
delta_foxtrot_oscar_december_charlie.wav -- a mix of DELTA, FOXTROT, OSCAR,DECEMBER, CHARLIE.

My observation is that keyword search mode is working well for the word DECEMBER and FOXTROT while zero recognition for other words in the database.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-05-14

My observation is that keyword search mode is working well for the word DECEMBER and FOXTROT while zero recognition for other words in the database.

Well, that's about it, you can do:

1) Use more accurate models en-us or en-us-semi

2) Use longer words, it's recommended to use words of 4 syllables for activation

3) Tune the threshold for every word. The per-threshold could be pointed in keyword list file. For example for charlie the more reasonable threshold is 1e-10.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peetambar Singh - 2014-05-15

Hi Nicole ,

Thanks for the quick response,

Could you please specify the format of the keyword list file.

for example should I write the threshold value as
CHARLIE 1e-10 in one single line.

Any thumb-rule on how to judge a suitable value for the kws_threshold.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2014-05-15

Could you please specify the format of the keyword list file.

Threshold separated by /:

charlie / 1e-10

Any thumb-rule on how to judge a suitable value for the kws_threshold.

Ideally threshold must be computed on a test set
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peetambar Singh - 2014-05-23

Hi Nicole,

When I specify the thresholds in the keyword list file as suggested earlier in the thread

CHARLIE / 1e-10
OSCAR / 1e-10

I get the following error
ERROR: "kws_search.c", line 158: The word '/' is missing in the dictionary

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- bic-user - 2014-05-23
  
  try:
  CHARLIE /1e-10/
  OSCAR /1e-10/
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

manjusha bangale - 2017-03-04

please specify how to create keyword list file.is it a text file?
and what another require for this please specify in detail.
also i am created a language model of some specific words.
recognizer sending data in onpartialResult but null in onResult method.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Arseniy Gorin - 2017-03-04
  
  http://cmusphinx.sourceforge.net/wiki/tutoriallm
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

manjusha bangale - 2017-03-08

hello sir, i have created one grammer file for specific commands like start bike,stop bike ,run engine ,lock bike etc.these keyword got recognized and returned in onResult method.
sir but problem is that it sometimes sends previously passed commands.
means first i said lock bike,unlock bike then start bike then run engine it sends me again stop bike or lock bike .when my bike is started it may send stop bike which i dont sent now.it is actually previous called command so how can i avoid this.si rplease please please please help me .

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Arseniy Gorin - 2017-03-08
  
  you should create new message (not comment in the old threads).
  also you should provide your directory files and the commands you are trying to run
  it is impossible to understand what you are doing right now and to help you...
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Multiple Keyword spotting

Speech Recognition Toolkit

Forums

Help

Multiple Keyword spotting document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Multiple Keyword spotting