Hello all. I'm trying to create basic voice recognizer based on pocketsphinx. I use english model and turtle dictionary. Example file "goforward.raw" is recognized just fine, so I assume configuration is ok. However, my voice recorded from microphone is not recognized. And "goforward.raw" sound played by my mobile phone and then captured by microphone is not recognized too.
I have verified that capturing settings are ok - 16K, 16bit, mono, raw. In order to verify I have converted "goforward.raw" and "recording.raw" to WAV using same SOX settings.
Files definitely sound different, but I don't understand what exactly is a problem and how to fix it.
You need to provide the code (exact configuration and models) you are using
If you are on Android, in decoder setup you need to uncomment setRawLogDir, it will store raw files on the sdcard as provided in the logs, you need to share them.
You need to add -rawlogdir <dir> option to pocketsphinx config to store raw files on desktop.
Last edit: Nickolay V. Shmyrev 2016-01-25
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Code is written in Rust, because I'm writing Rust bindings for PocketSphinx right now. Here is relevant snippet:
constSAMPLE_RATE:f64=16_000.0;constCHANNELS:i32=1;letps_config=pocketsphinx::CmdLn::init(true,&["pocketsphinx","-hmm","data/cmusphinx-en-us-5.2","-lm","data/cmusphinx-5.0-en-us.lm","-dict","data/turtle.dic","-samprate",&format!("{}",SAMPLE_RATE),"-rawlogdir","log",]).unwrap();letps_decoder=pocketsphinx::PsDecoder::init(ps_config);ps_decoder.start_utt(None).expect("can't start recognition");//readrawfilecontentinto'd'ps_decoder.process_raw(d,false,false).expect("can't process sound");ps_decoder.end_utt().unwrap();matchps_decoder.get_hyp(){None=>{println!("No idea what you have just said :-(");},Some((hyp,_utt_id,_score))=>{println!("You have said: {}",hyp);},}
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The audio recorded by microphone is very limited in frequency bands, what kind of microphone do you use exactly?
You will not be able to recognize it with standard acoustic model, you will have to use 8khz acoustic model available in downloads, though it will still require an adaptation for your audio channel.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello all. I'm trying to create basic voice recognizer based on pocketsphinx. I use english model and turtle dictionary. Example file "goforward.raw" is recognized just fine, so I assume configuration is ok. However, my voice recorded from microphone is not recognized. And "goforward.raw" sound played by my mobile phone and then captured by microphone is not recognized too.
I have verified that capturing settings are ok - 16K, 16bit, mono, raw. In order to verify I have converted "goforward.raw" and "recording.raw" to WAV using same SOX settings.
Files definitely sound different, but I don't understand what exactly is a problem and how to fix it.
Files are here:
https://dl.dropboxusercontent.com/u/495486/cmusphinx/goforward.wav
https://dl.dropboxusercontent.com/u/495486/cmusphinx/recorded.wav
You need to provide the code (exact configuration and models) you are using
If you are on Android, in decoder setup you need to uncomment setRawLogDir, it will store raw files on the sdcard as provided in the logs, you need to share them.
You need to add
-rawlogdir <dir>option to pocketsphinx config to store raw files on desktop.Last edit: Nickolay V. Shmyrev 2016-01-25
Ok, here are my files: https://dl.dropboxusercontent.com/u/495486/cmusphinx/conf.zip
Here is log from stdout: https://dl.dropboxusercontent.com/u/495486/cmusphinx/log.txt
Here is raw file from log directory: https://dl.dropboxusercontent.com/u/495486/cmusphinx/000000000.raw
Code is written in Rust, because I'm writing Rust bindings for PocketSphinx right now. Here is relevant snippet:
The audio recorded by microphone is very limited in frequency bands, what kind of microphone do you use exactly?
You will not be able to recognize it with standard acoustic model, you will have to use 8khz acoustic model available in downloads, though it will still require an adaptation for your audio channel.
What kind of adaptation?
It's laptop built-in microphone. Will try to find external one.
Ok, let me know how it goes.
It is also worth to note that it is better to restrict the language model, not the dictionary.