Hi, I am new to pocketsphinx, I have been playing with pocketsphinx_continuous and various params with limited success.
I simply want to continuously listen for a single activation keyword (and then this will trigger something else which is not Voice related), when I configure this in a kws file I just get loads of false activations. If I honest I am a bit lost in all the options but willing to learn!
I don't mind what the activation keyword is.
Could someone point me in the right direction please to try and get no false actiivation and a higher detection on my one keyword please.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Please let us know what did you try and what were the results. For debugging accuracy issues you'd better provide audio recording you are playing with.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Can I improve the detections by proving an .lm and .dic file for the same keyword or by training my voice on this single keyword?
No, langauge models are for different purpose. Voice adaptation as described in our tutorial might be reasonable idea but it is probably better to get more distinctive keyphrase.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dictionary file is loaded by default, you do not need another dictionary. If you have some special keyword missing in the dictionary, you need to add it there.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I changed my recording levels and can see now that pocketsphinx_continuous is only ready when it says Ready..., as soon as as I say something it changes to Listening..., slightly misleading I guess as this really is processing!?
Anyway, for whatever reason, I can get it to detect "oh mighty computer" much better than "wake up samantha" which it never detects yet is in the default model/en-us/cmudict-en-us.dict dictionary. So may be it would be useful to do voice adaption on "wake up samantha", time to read some more....
Last edit: Andy Barker 2016-01-12
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Funny, it was the way I was saying wake up, as if it was one word "wakeup" and it obviously didn't like it! I have now tried just using the single word samantha and getting much better results with 1e-60.
Nikolay, with your help I have now found several other people asking similar things to me, and each time I see your replies - can't thank you enough for your dedication!
If you have time, could you explain the meaning of kws_threshold, and what the difference would be between 1e-10 and 1e-60. I am just randomly trying values without knowing why and I can't find any good doc on this option.
The other thing I am playing with is mic input volume, is there any recomended recording level that pocketsphinx works better at, again I am ramdonly moving the level up and down which is not very scientific. I think it prefers is prefers 16khz and 48k samples.
Last edit: Andy Barker 2016-01-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you have time, could you explain the meaning of kws_threshold, and what the difference would be between 1e-10 and 1e-60. I am just randomly trying values without knowing why and I can't find any good doc on this option.
It is just a threshold. The smaller threshold is the more false alarms you get. The higher threshold, less false alarms you get, but you also start to skip real matches.
You need to record a sample and try with different thresholds. Then compare matches and false alarms and find the best threshold which gives all matches but not any false alarms.
The other thing I am playing with is mic input volume, is there any recomended recording level that pocketsphinx works better at, again I am ramdonly moving the level up and down which is not very scientific. I think it prefers is prefers 16khz and 48k samples.
It is hard to say what is going on there but I suggest you to provide recordings or raw dumps which you can collect with -rawlogdir option. Input volume should not matter at all. 48khz samples require special decoder options, by default it expects 16khz samples only. Samples must be mono. Incorrect format can be very harmful.
Last edit: Nickolay V. Shmyrev 2016-01-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Wow that is amazing to think you are working with variables that go as low as 1e-60!
Last evening I had a play with using python but notice that the python examples simply take streams in 1024 chunks and check for keywords.
This does not seem very scientific as it would make more sense dependent on volume levels to find natural start and stop to sentences/command sentences/keywords given. I assume this is what pocketsphinx_continuous is doing as well and maybe that is why sometimes there is a missing detection.
Or am I missing something?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Wow that is amazing to think you are working with variables that go as low as 1e-60!
Yes, probabilities of specific events might be very small, thats why they are usually handled in log domain.
This does not seem very scientific as it would make more sense dependent on volume levels to find natural start and stop to sentences/command sentences/keywords given.
pocketsphinx_continuous collects the data and adapts to the current volume, it takes about 10 seconds into account. On the bad side, it needs few seconds on start to make initial estimation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I am new to pocketsphinx, I have been playing with pocketsphinx_continuous and various params with limited success.
I simply want to continuously listen for a single activation keyword (and then this will trigger something else which is not Voice related), when I configure this in a kws file I just get loads of false activations. If I honest I am a bit lost in all the options but willing to learn!
I don't mind what the activation keyword is.
Could someone point me in the right direction please to try and get no false actiivation and a higher detection on my one keyword please.
Hello Andy
This issue is covered in our FAQ:
http://cmusphinx.sourceforge.net/wiki/faq#qhow_to_implement_hot_word_listening
Please let us know what did you try and what were the results. For debugging accuracy issues you'd better provide audio recording you are playing with.
Thanks Nikolay.
Some success.... I got a detection every now and then with this and no false positives.
pocketsphinx_continuous -keyphrase "okay computer" -kws_threshold 1e-30 -inmic yes
Can I improve the detections by providing an .lm and .dic file for the same keyword or by training my voice on this single keyword?
Last edit: Andy Barker 2016-01-12
No, langauge models are for different purpose. Voice adaptation as described in our tutorial might be reasonable idea but it is probably better to get more distinctive keyphrase.
When I look in the dic file it has broken down the word samantha as...
SAMANTHA S AH M AE N TH AH
Does pocketsphinx not need the same to help identify my keyword?
(I was thinking my keyword would be just samantha!)
Last edit: Andy Barker 2016-01-12
Dictionary file is loaded by default, you do not need another dictionary. If you have some special keyword missing in the dictionary, you need to add it there.
Ah, that makes sense now.
With some output like the following, is it the case pocketsphinx_continuous is only listening when it says "Listening", or is it always listening?
I changed my recording levels and can see now that pocketsphinx_continuous is only ready when it says Ready..., as soon as as I say something it changes to Listening..., slightly misleading I guess as this really is processing!?
Anyway, for whatever reason, I can get it to detect "oh mighty computer" much better than "wake up samantha" which it never detects yet is in the default model/en-us/cmudict-en-us.dict dictionary. So may be it would be useful to do voice adaption on "wake up samantha", time to read some more....
Last edit: Andy Barker 2016-01-12
You'd better provide recording, maybe you say samantha somehow differently.
Funny, it was the way I was saying wake up, as if it was one word "wakeup" and it obviously didn't like it! I have now tried just using the single word samantha and getting much better results with 1e-60.
Nikolay, with your help I have now found several other people asking similar things to me, and each time I see your replies - can't thank you enough for your dedication!
If you have time, could you explain the meaning of kws_threshold, and what the difference would be between 1e-10 and 1e-60. I am just randomly trying values without knowing why and I can't find any good doc on this option.
The other thing I am playing with is mic input volume, is there any recomended recording level that pocketsphinx works better at, again I am ramdonly moving the level up and down which is not very scientific. I think it prefers is prefers 16khz and 48k samples.
Last edit: Andy Barker 2016-01-13
It is just a threshold. The smaller threshold is the more false alarms you get. The higher threshold, less false alarms you get, but you also start to skip real matches.
You need to record a sample and try with different thresholds. Then compare matches and false alarms and find the best threshold which gives all matches but not any false alarms.
It is hard to say what is going on there but I suggest you to provide recordings or raw dumps which you can collect with -rawlogdir option. Input volume should not matter at all. 48khz samples require special decoder options, by default it expects 16khz samples only. Samples must be mono. Incorrect format can be very harmful.
Last edit: Nickolay V. Shmyrev 2016-01-13
Am I correct in thinking then my use of
-samprate 16000/8000/48000is correct?So re threshold
1e-10 = 0.0000000001
1e-20 = 0.00000000000000000001
So 1e-20 would get more false alarms than 1e-10?
Last edit: Andy Barker 2016-01-13
Yes, we accept everything with the probability more than 1e-20, it is more than variants with probability 1e-10.
Wow that is amazing to think you are working with variables that go as low as 1e-60!
Last evening I had a play with using python but notice that the python examples simply take streams in 1024 chunks and check for keywords.
This does not seem very scientific as it would make more sense dependent on volume levels to find natural start and stop to sentences/command sentences/keywords given. I assume this is what pocketsphinx_continuous is doing as well and maybe that is why sometimes there is a missing detection.
Or am I missing something?
Yes, probabilities of specific events might be very small, thats why they are usually handled in log domain.
pocketsphinx_continuous collects the data and adapts to the current volume, it takes about 10 seconds into account. On the bad side, it needs few seconds on start to make initial estimation.