I have some audio clips obtained by decompressing and splitting an audio-video stream.
These audio clips can have many variations in accent, language, ambient sound etc.
I need to spot multiple keywords from these clips and also wish to locate the time where the keyword has been uttered.
Can it be accomplished using pocketsphinx?
Regards
PK-Singh
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, you can do it with pocketsphinx. There is a kws search mode, pocketsphinx -infile file.wav -kws keyphrase.file. Keyphrase file contains key phrases one per line.
To access this API you need to checkout pocketsphinx from subversion.
Last edit: Nickolay V. Shmyrev 2014-05-12
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
thank you so much for your quick response. I'm using pocket pocketsphinx under Windows 7, I tried above mentioned command but it didn't work. Here are the the parameters I set, please let me know what is it that I'm doing wrong.
| To access this API you need to checkout pocketsphinx from subversion.
Could you please specify the version with kws which should be checked out? I tried the latest from the trunk but it appears to be too sensitive to all the background noise (the earlier version did not have such problem)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I actually tried it in continious mode first and compared the latest version with 0.8. With the same background the older version reacted only to rather loud speech, while the newer responded to almost each minor sound, which is every couple of seconds. Is there any way to control the sensitivity in the latest version?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sorry, I'll try to describe my problem more properly now.
I'm testing a latest version of pocketsphinx from the trunk, with common Russian language model (voxforge-ru-0.2). I have a sample audio file with a child speaking "как тебя зовут?". The common processing in the continuous mode gives me "как тебя золотой". But when I switch to keyword search and try to find any part of this phrase, I always get 'null' answer (with any kws threshold). The audio file and keyfile are attached.
Right, voxforge-ru is not very accurate model and it's not expected to be accurate for children voices. Another factor of accuracy is parameter estimation. Your utterance is very short and you are trying to detect keyword from the beginning. There is no enough time for recognizer to estimate the channel parameters. If you make the utterance longer or just add -cmninit 12.85,0,0.07,-0.27,-0.30,0.07,-0.37,-0.13,-0.16,-0.16,-0.25,-0.10,-0.20 to command line arguments, you can lower threshold to 1e-25. And it will recognize "как тебя зовут" with lm properly.
Last edit: Nickolay V. Shmyrev 2014-05-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This is good news, but I am very new to ASR and have some doubts
Will the default acoustic model be able to handle different accents and different noise conditions
My application needs to spot words in English being spoken in mixed language.
For example a person is speaking something in Arabic but in between speaks some keywords/ phrases in English. Can this also be handled with the existing acoustic model.
For the above scenario, shall I have to use Language Model or the goal can be achieved by a FSG also.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Will the default acoustic model be able to handle different accents and different noise conditions
Yes
My application needs to spot words in English being spoken in mixed language. For example a person is speaking something in Arabic but in between speaks some keywords/ phrases in English. Can this also be handled with the existing acoustic model.
Yes, but it's better to train a new model
For the above scenario, shall I have to use Language Model or the goal can be achieved by a FSG also.
You need neither LM nor FSG, it's a third search mode - keyword spotting. It looks for certain words from a list.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is it possible to time stamp the occurrence of the keyword in an utterance
I have seen some discussions of keyword spotting based on some grammar constructs. Is this third search mode better in terms of accuracy (false acceptance/rejection) etc.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have given the following command line for keyword spotting, audio input is a microphone.
I have observed a lot of false rejections. There is only word in the keyphrase file
OSCAR and its pronunciation in the dictionary file is
OSCAR AO S K ER
Following is the link for the database on which I am trying out kws_search.
I have used the following command line
pocketsphinx_continuous -hmm hub4wsj_sc_8k -samprate 16000 -kws keyphrase.file -dict kws.dic -infile oscar.wav (similarly for other wav files).
keyphrase.file contains the following keyphrases -- OSCAR, CHARLIE, FOXTROT, DELTA, DECEMBER.
kws.dic is the dictionary file. I am using the hub4wsj_sc_8k model as available with the pocketsphinx release. I am using the latest svn version of pocketsphinx.
It contains 7 audio files
oscar.wav -- 5/6 utterances of OSCAR,
delta.wav, -- 5/6 utterances of DELTA
december.wav, -- 5/6 utterances of DECEMBER
charlie.wav, -- 5/6 utterances of CHARLIE
foxtrot.wav, -- 5/6 utterances of FOXTROT
oscar_december_charlie.wav -- a mix of OSCAR, DECEMBER and CHARLIE
delta_foxtrot_oscar_december_charlie.wav -- a mix of DELTA, FOXTROT, OSCAR,DECEMBER, CHARLIE.
My observation is that keyword search mode is working well for the word DECEMBER and FOXTROT while zero recognition for other words in the database.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
My observation is that keyword search mode is working well for the word DECEMBER and FOXTROT while zero recognition for other words in the database.
Well, that's about it, you can do:
1) Use more accurate models en-us or en-us-semi
2) Use longer words, it's recommended to use words of 4 syllables for activation
3) Tune the threshold for every word. The per-threshold could be pointed in keyword list file. For example for charlie the more reasonable threshold is 1e-10.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
please specify how to create keyword list file.is it a text file?
and what another require for this please specify in detail.
also i am created a language model of some specific words.
recognizer sending data in onpartialResult but null in onResult method.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hello sir, i have created one grammer file for specific commands like start bike,stop bike ,run engine ,lock bike etc.these keyword got recognized and returned in onResult method.
sir but problem is that it sometimes sends previously passed commands.
means first i said lock bike,unlock bike then start bike then run engine it sends me again stop bike or lock bike .when my bike is started it may send stop bike which i dont sent now.it is actually previous called command so how can i avoid this.si rplease please please please help me .
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
you should create new message (not comment in the old threads).
also you should provide your directory files and the commands you are trying to run
it is impossible to understand what you are doing right now and to help you...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have some audio clips obtained by decompressing and splitting an audio-video stream.
These audio clips can have many variations in accent, language, ambient sound etc.
I need to spot multiple keywords from these clips and also wish to locate the time where the keyword has been uttered.
Can it be accomplished using pocketsphinx?
Regards
PK-Singh
Yes, you can do it with pocketsphinx. There is a kws search mode,
pocketsphinx -infile file.wav -kws keyphrase.file
. Keyphrase file contains key phrases one per line.To access this API you need to checkout pocketsphinx from subversion.
Last edit: Nickolay V. Shmyrev 2014-05-12
Hi,
thank you so much for your quick response. I'm using pocket pocketsphinx under Windows 7, I tried above mentioned command but it didn't work. Here are the the parameters I set, please let me know what is it that I'm doing wrong.
Command: pocketsphinx_continuous.exe -argfile argFile.txt -infile .\test_files\my_audio.wav
argFile:
-dict ./model/lm/cmu07a.dic
-hmm ./model/hmm/en_broadcastnews_16k_ptm256_5000
-lm ./model/lm/en-70k.dmp
-kws ./model/keyphrase.file
-kws_threshold 0.95
| To access this API you need to checkout pocketsphinx from subversion.
Could you please specify the version with kws which should be checked out? I tried the latest from the trunk but it appears to be too sensitive to all the background noise (the earlier version did not have such problem)
Latest one
Provide the information to reproduce your problem - audio file, keyword and so on.
I actually tried it in continious mode first and compared the latest version with 0.8. With the same background the older version reacted only to rather loud speech, while the newer responded to almost each minor sound, which is every couple of seconds. Is there any way to control the sensitivity in the latest version?
I'm not sure what do you mean by that, this thread is about keyword spotting and keyword activation. It is not available in 0.8.
Sorry, I'll try to describe my problem more properly now.
I'm testing a latest version of pocketsphinx from the trunk, with common Russian language model (voxforge-ru-0.2). I have a sample audio file with a child speaking "как тебя зовут?". The common processing in the continuous mode gives me "как тебя золотой". But when I switch to keyword search and try to find any part of this phrase, I always get 'null' answer (with any kws threshold). The audio file and keyfile are attached.
I succeed to detect keywords in your audio file with -kws_threshold 1e-50
Right, voxforge-ru is not very accurate model and it's not expected to be accurate for children voices. Another factor of accuracy is parameter estimation. Your utterance is very short and you are trying to detect keyword from the beginning. There is no enough time for recognizer to estimate the channel parameters. If you make the utterance longer or just add
-cmninit 12.85,0,0.07,-0.27,-0.30,0.07,-0.37,-0.13,-0.16,-0.16,-0.25,-0.10,-0.20
to command line arguments, you can lower threshold to 1e-25. And it will recognize "как тебя зовут" with lm properly.Last edit: Nickolay V. Shmyrev 2014-05-28
This is good news, but I am very new to ASR and have some doubts
For example a person is speaking something in Arabic but in between speaks some keywords/ phrases in English. Can this also be handled with the existing acoustic model.
Yes
Yes, but it's better to train a new model
You need neither LM nor FSG, it's a third search mode - keyword spotting. It looks for certain words from a list.
Is it possible to time stamp the occurrence of the keyword in an utterance
I have seen some discussions of keyword spotting based on some grammar constructs. Is this third search mode better in terms of accuracy (false acceptance/rejection) etc.
I have given the following command line for keyword spotting, audio input is a microphone.
I have observed a lot of false rejections. There is only word in the keyphrase file
OSCAR and its pronunciation in the dictionary file is
OSCAR AO S K ER
pocketsphinx_continuous.exe -hmm hub4wsj_sc_8k -kws keyphrase.file -dict keyphrase.dic -samprate 16000.
Do I need to tune some other parameters. What is the significance of kws_threshold parameter.
Provide the data to reproduce your problem.
threshold controls false alarm rate, however, it's better to provide the data first, there might be other issues.
Hi Nicole,
Following is the link for the database on which I am trying out kws_search.
I have used the following command line
pocketsphinx_continuous -hmm hub4wsj_sc_8k -samprate 16000 -kws keyphrase.file -dict kws.dic -infile oscar.wav (similarly for other wav files).
Is the above command line correct ????
https://www.dropbox.com/s/orn9fa2v65lz008/debug_kws.zip
keyphrase.file contains the following keyphrases -- OSCAR, CHARLIE, FOXTROT, DELTA, DECEMBER.
kws.dic is the dictionary file. I am using the hub4wsj_sc_8k model as available with the pocketsphinx release. I am using the latest svn version of pocketsphinx.
It contains 7 audio files
oscar.wav -- 5/6 utterances of OSCAR,
delta.wav, -- 5/6 utterances of DELTA
december.wav, -- 5/6 utterances of DECEMBER
charlie.wav, -- 5/6 utterances of CHARLIE
foxtrot.wav, -- 5/6 utterances of FOXTROT
oscar_december_charlie.wav -- a mix of OSCAR, DECEMBER and CHARLIE
delta_foxtrot_oscar_december_charlie.wav -- a mix of DELTA, FOXTROT, OSCAR,DECEMBER, CHARLIE.
My observation is that keyword search mode is working well for the word DECEMBER and FOXTROT while zero recognition for other words in the database.
Well, that's about it, you can do:
1) Use more accurate models en-us or en-us-semi
2) Use longer words, it's recommended to use words of 4 syllables for activation
3) Tune the threshold for every word. The per-threshold could be pointed in keyword list file. For example for charlie the more reasonable threshold is 1e-10.
Hi Nicole ,
Thanks for the quick response,
Could you please specify the format of the keyword list file.
for example should I write the threshold value as
CHARLIE 1e-10 in one single line.
Any thumb-rule on how to judge a suitable value for the kws_threshold.
Threshold separated by /:
Ideally threshold must be computed on a test set
Hi Nicole,
When I specify the thresholds in the keyword list file as suggested earlier in the thread
CHARLIE / 1e-10
OSCAR / 1e-10
I get the following error
ERROR: "kws_search.c", line 158: The word '/' is missing in the dictionary
try:
CHARLIE /1e-10/
OSCAR /1e-10/
please specify how to create keyword list file.is it a text file?
and what another require for this please specify in detail.
also i am created a language model of some specific words.
recognizer sending data in onpartialResult but null in onResult method.
http://cmusphinx.sourceforge.net/wiki/tutoriallm
hello sir, i have created one grammer file for specific commands like start bike,stop bike ,run engine ,lock bike etc.these keyword got recognized and returned in onResult method.
sir but problem is that it sometimes sends previously passed commands.
means first i said lock bike,unlock bike then start bike then run engine it sends me again stop bike or lock bike .when my bike is started it may send stop bike which i dont sent now.it is actually previous called command so how can i avoid this.si rplease please please please help me .
you should create new message (not comment in the old threads).
also you should provide your directory files and the commands you are trying to run
it is impossible to understand what you are doing right now and to help you...