I'm developing a little voice driven player for disabled people for my final
project at univeristy.
Using the documentation, forums, etc i've been able to build an usable
environment with an iPaq h3650, Pocketsphinx 0.5 and Familiar Linux 0.8.4.
I've build my own language model with lmtool and i'm using rm1 audiobase and
speech recognition is pretty functional. When it recognizes the word START, it
starts playing an audio book.
My problem is that i don't know how to discard noise, incorrect words or even
the audio book itself. So there is no way i can say STOP to stop playing.
My questions are:
- When it starts to play, the app goes crazy and tries to recognize
itself with each word it listens. I read AudioTool from Sphinx4 can do
that by using two different channels for input an output. Is there a way
Pocketsphinx can ignore what it's being played?
- Whatever word you say, the app recognizes one word of the vocab, i wonder
if it's possible to define an error threshold so the recognizer could
discard invalid words.
For example, if i say GET OUT recognizer recognizes START or STOP but i
want it to recognize it's not a word from the custom dictionary.
- Last question, is it possible to limit de listening time? I only want to
recognize some predefined words (commands), not dictation or large
phrases.
I've searched the documentation but i haven't found any clue on how to modify
this behaviour. I'm sure there's a way to do this but i don't know where to
start looking for.
Thank you,
Arnau
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
"....- Whatever word you say, the app recognizes one word of the vocab, i wonder
if it's possible to define an error threshold so the recognizer could
discard invalid words.
For example, if i say GET OUT recognizer recognizes START or STOP but i
want it to recognize it's not a word from the custom dictionary.... "
I also have managed to run version 0.5 on Mio 701e PPC (however using Windows). Refer to the topic PocketSphinx 0.5 running on WinCE (WM 5.0 ) if you haven't read it.
Well, what I was going to say is that I get the same behaviour. Pocketsphinx does not recognise the words I speak. It gives me a different "word" altogether (just 1 word). Is this what you meant by the above quote?
I've got a question for you too - are you using code from pocketsphinx_continuous project?
I have unusual output after ps_init():
INFO: ....\src\libsphinxbase\feat\cmn_prior.c(122): cmn_prior_update: from < 266240.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >. What are you using to open the sound card?
-(ad_open_dev(cmd_ln_str_r(config, "-adcdev"),(int)cmd_ln_float32_r(config, "-samprate"))) == NULL)
-or ad_open_sp(16000).
Someone mentioned it has to do with "feature extraction" - do you know exactly what this means?
As for your problem, why don't you assign keywords to start/stop the recognition process - of course these will have to be in the dictionary (not what you want). For example "Reco Activate" / "Reco Deactivate". Then use these in your code to add a loop around the continuous listen & decode for loop in continuous.c. Effectively, it is a copy of the whole recoginition loop but it will only listen for the Activate/Deactivate words. This is long winded but with a bit chiselling I think it can work.
Drew
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello Drew,
Thank you for your reply, I've read your post "PocketSphinx 0.5 running on WinCE (WM 5.0 )" and I had the same problem some months ago when I was trying to build the system on a Windows Mobile environment (WM 5.0 and WM 6.0).
You can find some questions I asked in the forum but and I can't help you to fix this. I think Pocketsphinx hasn't been fully tested on those platforms. However I read it was working well in Linux so I gave it a try.
I don't know what "feature extraction" exactly means in this context but I liked your idea to start/stop the recognition. However the problem I have is that I don't know how to recognize only one word and discard the others (listen to one word only).
I would want to know if I can improve accuracy by training my grammar against RM1 audio base with SphinxTrain, maybe that way I could recognize specific words.
Thank you,
Arnau
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello all,
I'm developing a little voice driven player for disabled people for my final
project at univeristy.
Using the documentation, forums, etc i've been able to build an usable
environment with an iPaq h3650, Pocketsphinx 0.5 and Familiar Linux 0.8.4.
I've build my own language model with lmtool and i'm using rm1 audiobase and
speech recognition is pretty functional. When it recognizes the word START, it
starts playing an audio book.
My problem is that i don't know how to discard noise, incorrect words or even
the audio book itself. So there is no way i can say STOP to stop playing.
My questions are:
I've searched the documentation but i haven't found any clue on how to modify
this behaviour. I'm sure there's a way to do this but i don't know where to
start looking for.
Thank you,
Arnau
Hie Arnau
Please read my recent post: http://sourceforge.net/forum/message.php?msg_id=5154975
Drew
Hie Arnau
"....- Whatever word you say, the app recognizes one word of the vocab, i wonder
if it's possible to define an error threshold so the recognizer could
discard invalid words.
For example, if i say GET OUT recognizer recognizes START or STOP but i
want it to recognize it's not a word from the custom dictionary.... "
I also have managed to run version 0.5 on Mio 701e PPC (however using Windows). Refer to the topic PocketSphinx 0.5 running on WinCE (WM 5.0 ) if you haven't read it.
Well, what I was going to say is that I get the same behaviour. Pocketsphinx does not recognise the words I speak. It gives me a different "word" altogether (just 1 word). Is this what you meant by the above quote?
I've got a question for you too - are you using code from pocketsphinx_continuous project?
I have unusual output after ps_init():
INFO: ....\src\libsphinxbase\feat\cmn_prior.c(122): cmn_prior_update: from < 266240.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >. What are you using to open the sound card?
-(ad_open_dev(cmd_ln_str_r(config, "-adcdev"),(int)cmd_ln_float32_r(config, "-samprate"))) == NULL)
-or ad_open_sp(16000).
Someone mentioned it has to do with "feature extraction" - do you know exactly what this means?
As for your problem, why don't you assign keywords to start/stop the recognition process - of course these will have to be in the dictionary (not what you want). For example "Reco Activate" / "Reco Deactivate". Then use these in your code to add a loop around the continuous listen & decode for loop in continuous.c. Effectively, it is a copy of the whole recoginition loop but it will only listen for the Activate/Deactivate words. This is long winded but with a bit chiselling I think it can work.
Drew
Hello Drew,
Thank you for your reply, I've read your post "PocketSphinx 0.5 running on WinCE (WM 5.0 )" and I had the same problem some months ago when I was trying to build the system on a Windows Mobile environment (WM 5.0 and WM 6.0).
You can find some questions I asked in the forum but and I can't help you to fix this. I think Pocketsphinx hasn't been fully tested on those platforms. However I read it was working well in Linux so I gave it a try.
I don't know what "feature extraction" exactly means in this context but I liked your idea to start/stop the recognition. However the problem I have is that I don't know how to recognize only one word and discard the others (listen to one word only).
I would want to know if I can improve accuracy by training my grammar against RM1 audio base with SphinxTrain, maybe that way I could recognize specific words.
Thank you,
Arnau