CMU Sphinx / Forums / Help: Some questions about pocketSphinx settings

Montagu Adrien - 2015-07-15

Hello,

Sorry if my english is not well or if the questions have already been posed.

So after one day to work on accuracy problem I just figured out that my record is in stereo, 16 bits, 48k Hz. And I can't change it to mono 16 bits, 16k Hz. So what do you use for real time recognition ? (For the test I have used Audacity, I'm on windows with RealTek audio recorder)

Then I need advice on recognition settings. I use pocketSphinx for robot control so I need only 10-20 world with the possibility to change them quickly. For now I have created my own langage model but maybe create a grammar will be better, how can I decide ?

Finally I'm french and the accuracy is not awesome. I see so many way to improve the accoustic model and the langage model/dictionnary model but which one is higly recommended for me ?

EDIT: How can I remove the info log message ? I see in a recherche the way to do that in java with config file but what is the way in C/C++ API ?

Thank you,
Adrien

Last edit: Montagu Adrien 2015-07-15

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-07-15
  
  And I can't change it to mono 16 bits, 16k Hz. So what do you use for real time recognition ?
  
  It happens on Windows. You might configure decoder to recognize 48khz audio with the options "-samprate 48000 -nfft 2048"
  
  For now I have created my own langage model but maybe create a grammar will be better, how can I decide ?
  
  I extended the section about that in tutorial, take a look:
  
  http://cmusphinx.sourceforge.net/wiki/tutoriallm#building_language_model
  
  Overall you can use either grammar or language model, it does not really matter. With small vocabulary both should work.
  
  Finally I'm french and the accuracy is not awesome. I see so many way to improve the accoustic model and the langage model/dictionnary model but which one is higly recommended for me ?
  
  French model should be ok for you
  
  EDIT: How can I remove the info log message ? I see in a recherche the way to do that in java with config file but what is the way in C/C++ API ?
  
  From C you can call err_set_logfp(NULL), you can also add config value "-logfn /dev/null".
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Montagu Adrien - 2015-07-16

Thanks for all.

Just I try this :
config = cmd_ln_init(NULL, ps_args(), TRUE,
"-samprate", "48000", "-nfft", "2048",
"-hmm", MODELDIR "/roboticModel/fr-fr",
"-lm", MODELDIR "/roboticModel/roboticOrder.lm",
"-dict", MODELDIR "/roboticModel/roboticOrder.dic",
NULL);

But it do not work. I have this error : FFT: Number of points must be greater or equal to frame size (1230 samples).

Maybe I need to put this rate here :
ad = ad_open_dev(cmd_ln_str_r(config, "-adcdev"), 48000)

but I can't fix nfft this way.

Last edit: Montagu Adrien 2015-07-16

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-07-16
  
  feat.params in the model accidentally uses -nfft 512, so if you remove that line from feat.params it should work fine even with 48khz.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Montagu Adrien - 2015-07-16
    
    Okay thanks it works just my sound is in 44100hz and not 48000.
    
    Does matter ? And now I have 16k Hz and 48k Hz what is the best ? why ?
    
    Last edit: Montagu Adrien 2015-07-16
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2015-07-27
      
      It does not matter which sample rate to use as long as it is higher than 16khz.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Montagu Adrien - 2015-07-16

Okay I have made some test and this is the result maybe for people who needs because if you recorder properties says Chanel 2, 16 bits, 48000 Hz maybe in the code it will not do the same !

with this command :
C:\pocketSphinx\pocketsphinx>bin\Debug\pocketsphinx_continuous.exe -inmic yes -hmm model\en-us\en-us -lm model\en-us\en-us.lm.dmp -dict model\en-us\cmudict-en-us.dict -rawlogdir C:\Users\Adrien\Downloads\tmp

the sound is okay with 16K hz

with this command :
C:\pocketSphinx\pocketsphinx>bin\Debug\pocketsphinx_continuous.exe -inmic yes -hmm model\en-us\en-us -lm model\en-us\en-us.lm.dmp -dict model\en-us\cmudict-en-us.dict -rawlogdir C:\Users\Adrien\Downloads\tmp -samprate 48000 -nfft 2048

the son is :
https://mega.co.nz/#!yUViEYZb!I5yerJYqJOfm8EbAIusYeV4945XTSw2LOy2jpB9euA4

now in the code with this command :
if ((ad = ad_open_dev(cmd_ln_str_r(config, "-adcdev, (int)cmd_ln_float32_r(config, "-samprate"))) == NULL)

I have good sound in 8kHZ

and with this one :
if ((ad = ad_open_dev(cmd_ln_str_r(config, "-adcdev, 16000)) == NULL)

the sound is okay in 16k hz and pocket sphynx begin to recognize somethings.

But now I use this code :

void Voice::recognizeFromMicrophoneWhileTime(int timeToWait) { ad_rec_t *ad; int16 adbuf[2048]; uint8 utt_started, in_speech; int32 k; char const *hyp; time_t timer; if ((ad = ad_open_dev(cmd_ln_str_r(config, "-adcdev"), 16000)) == NULL) E_FATAL("Failed to open audio device\n"); if (ad_start_rec(ad) < 0) E_FATAL("Failed to start recording\n"); if (ps_start_utt(ps) < 0) E_FATAL("Failed to start utterance\n"); utt_started = FALSE; std::cout << "LISTENING...." << std::endl; time(&timer); while (time(NULL) - timer < timeToWait) { //std::cout << time(NULL) - timer << std::endl; if ((k = ad_read(ad, adbuf, 2048)) < 0) E_FATAL("Failed to read audio\n"); ps_process_raw(ps, adbuf, k, FALSE, FALSE); in_speech = ps_get_in_speech(ps); } std::cout << "FINISH...." << std::endl; ps_end_utt(ps); hyp = ps_get_hyp(ps, NULL); if (hyp != NULL) { std::cout << hyp << std::endl; } sleepMsec(100); ad_close(ad); }

And I give 20 seconds but the recorder stop at 5. So by thinking a little bit and I see it comming to the buffer so just give higher value and it's perfect :

int16 adbuf[65536];
and:
if ((k = ad_read(ad, adbuf, 65536)) < 0)

Now it's just some probleme of accuracy that I need to improve with a better recorder and a better language model

Last edit: Montagu Adrien 2015-07-16
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-07-16
  
  Ok, congratulations! Let us know how it goes.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Some questions about pocketSphinx settings

Speech Recognition Toolkit

Forums

Help

Some questions about pocketSphinx settings document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Some questions about pocketSphinx settings