Sorry if my english is not well or if the questions have already been posed.
So after one day to work on accuracy problem I just figured out that my record is in stereo, 16 bits, 48k Hz. And I can't change it to mono 16 bits, 16k Hz. So what do you use for real time recognition ? (For the test I have used Audacity, I'm on windows with RealTek audio recorder)
Then I need advice on recognition settings. I use pocketSphinx for robot control so I need only 10-20 world with the possibility to change them quickly. For now I have created my own langage model but maybe create a grammar will be better, how can I decide ?
Finally I'm french and the accuracy is not awesome. I see so many way to improve the accoustic model and the langage model/dictionnary model but which one is higly recommended for me ?
EDIT: How can I remove the info log message ? I see in a recherche the way to do that in java with config file but what is the way in C/C++ API ?
Thank you,
Adrien
Last edit: Montagu Adrien 2015-07-15
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Overall you can use either grammar or language model, it does not really matter. With small vocabulary both should work.
Finally I'm french and the accuracy is not awesome. I see so many way to improve the accoustic model and the langage model/dictionnary model but which one is higly recommended for me ?
French model should be ok for you
EDIT: How can I remove the info log message ? I see in a recherche the way to do that in java with config file but what is the way in C/C++ API ?
From C you can call err_set_logfp(NULL), you can also add config value "-logfn /dev/null".
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Okay I have made some test and this is the result maybe for people who needs because if you recorder properties says Chanel 2, 16 bits, 48000 Hz maybe in the code it will not do the same !
with this command :
C:\pocketSphinx\pocketsphinx>bin\Debug\pocketsphinx_continuous.exe -inmic yes -hmm model\en-us\en-us -lm model\en-us\en-us.lm.dmp -dict model\en-us\cmudict-en-us.dict -rawlogdir C:\Users\Adrien\Downloads\tmp
the sound is okay with 16K hz
with this command :
C:\pocketSphinx\pocketsphinx>bin\Debug\pocketsphinx_continuous.exe -inmic yes -hmm model\en-us\en-us -lm model\en-us\en-us.lm.dmp -dict model\en-us\cmudict-en-us.dict -rawlogdir C:\Users\Adrien\Downloads\tmp -samprate 48000 -nfft 2048
now in the code with this command :
if ((ad = ad_open_dev(cmd_ln_str_r(config, "-adcdev, (int)cmd_ln_float32_r(config, "-samprate"))) == NULL)
I have good sound in 8kHZ
and with this one :
if ((ad = ad_open_dev(cmd_ln_str_r(config, "-adcdev, 16000)) == NULL)
the sound is okay in 16k hz and pocket sphynx begin to recognize somethings.
But now I use this code :
voidVoice::recognizeFromMicrophoneWhileTime(inttimeToWait){ad_rec_t*ad;int16adbuf[2048];uint8utt_started,in_speech;int32k;charconst*hyp;time_ttimer;if((ad=ad_open_dev(cmd_ln_str_r(config,"-adcdev"),16000))==NULL)E_FATAL("Failed to open audio device\n");if(ad_start_rec(ad)<0)E_FATAL("Failed to start recording\n");if(ps_start_utt(ps)<0)E_FATAL("Failed to start utterance\n");utt_started=FALSE;std::cout<<"LISTENING...."<<std::endl;time(&timer);while(time(NULL)-timer<timeToWait){//std::cout<<time(NULL)-timer<<std::endl;if((k=ad_read(ad,adbuf,2048))<0)E_FATAL("Failed to read audio\n");ps_process_raw(ps,adbuf,k,FALSE,FALSE);in_speech=ps_get_in_speech(ps);}std::cout<<"FINISH...."<<std::endl;ps_end_utt(ps);hyp=ps_get_hyp(ps,NULL);if(hyp!=NULL){std::cout<<hyp<<std::endl;}sleepMsec(100);ad_close(ad);}
And I give 20 seconds but the recorder stop at 5. So by thinking a little bit and I see it comming to the buffer so just give higher value and it's perfect :
Hello,
Sorry if my english is not well or if the questions have already been posed.
So after one day to work on accuracy problem I just figured out that my record is in stereo, 16 bits, 48k Hz. And I can't change it to mono 16 bits, 16k Hz. So what do you use for real time recognition ? (For the test I have used Audacity, I'm on windows with RealTek audio recorder)
Then I need advice on recognition settings. I use pocketSphinx for robot control so I need only 10-20 world with the possibility to change them quickly. For now I have created my own langage model but maybe create a grammar will be better, how can I decide ?
Finally I'm french and the accuracy is not awesome. I see so many way to improve the accoustic model and the langage model/dictionnary model but which one is higly recommended for me ?
EDIT: How can I remove the info log message ? I see in a recherche the way to do that in java with config file but what is the way in C/C++ API ?
Thank you,
Adrien
Last edit: Montagu Adrien 2015-07-15
It happens on Windows. You might configure decoder to recognize 48khz audio with the options "-samprate 48000 -nfft 2048"
I extended the section about that in tutorial, take a look:
http://cmusphinx.sourceforge.net/wiki/tutoriallm#building_language_model
Overall you can use either grammar or language model, it does not really matter. With small vocabulary both should work.
French model should be ok for you
From C you can call err_set_logfp(NULL), you can also add config value "-logfn /dev/null".
Thanks for all.
Just I try this :
config = cmd_ln_init(NULL, ps_args(), TRUE,
"-samprate", "48000", "-nfft", "2048",
"-hmm", MODELDIR "/roboticModel/fr-fr",
"-lm", MODELDIR "/roboticModel/roboticOrder.lm",
"-dict", MODELDIR "/roboticModel/roboticOrder.dic",
NULL);
But it do not work. I have this error : FFT: Number of points must be greater or equal to frame size (1230 samples).
Maybe I need to put this rate here :
ad = ad_open_dev(cmd_ln_str_r(config, "-adcdev"), 48000)
but I can't fix nfft this way.
Last edit: Montagu Adrien 2015-07-16
feat.params in the model accidentally uses -nfft 512, so if you remove that line from feat.params it should work fine even with 48khz.
Okay thanks it works just my sound is in 44100hz and not 48000.
Does matter ? And now I have 16k Hz and 48k Hz what is the best ? why ?
Last edit: Montagu Adrien 2015-07-16
It does not matter which sample rate to use as long as it is higher than 16khz.
Okay I have made some test and this is the result maybe for people who needs because if you recorder properties says Chanel 2, 16 bits, 48000 Hz maybe in the code it will not do the same !
with this command :
C:\pocketSphinx\pocketsphinx>bin\Debug\pocketsphinx_continuous.exe -inmic yes -hmm model\en-us\en-us -lm model\en-us\en-us.lm.dmp -dict model\en-us\cmudict-en-us.dict -rawlogdir C:\Users\Adrien\Downloads\tmp
the sound is okay with 16K hz
with this command :
C:\pocketSphinx\pocketsphinx>bin\Debug\pocketsphinx_continuous.exe -inmic yes -hmm model\en-us\en-us -lm model\en-us\en-us.lm.dmp -dict model\en-us\cmudict-en-us.dict -rawlogdir C:\Users\Adrien\Downloads\tmp -samprate 48000 -nfft 2048
the son is :
https://mega.co.nz/#!yUViEYZb!I5yerJYqJOfm8EbAIusYeV4945XTSw2LOy2jpB9euA4
now in the code with this command :
if ((ad = ad_open_dev(cmd_ln_str_r(config, "-adcdev, (int)cmd_ln_float32_r(config, "-samprate"))) == NULL)
I have good sound in 8kHZ
and with this one :
if ((ad = ad_open_dev(cmd_ln_str_r(config, "-adcdev, 16000)) == NULL)
the sound is okay in 16k hz and pocket sphynx begin to recognize somethings.
But now I use this code :
And I give 20 seconds but the recorder stop at 5. So by thinking a little bit and I see it comming to the buffer so just give higher value and it's perfect :
int16 adbuf[65536];
and:
if ((k = ad_read(ad, adbuf, 65536)) < 0)
Now it's just some probleme of accuracy that I need to improve with a better recorder and a better language model
Last edit: Montagu Adrien 2015-07-16
Ok, congratulations! Let us know how it goes.