Menu

Poor recognition over microphone

Help
kriomant
2016-01-25
2016-01-25
  • kriomant

    kriomant - 2016-01-25

    Hello all. I'm trying to create basic voice recognizer based on pocketsphinx. I use english model and turtle dictionary. Example file "goforward.raw" is recognized just fine, so I assume configuration is ok. However, my voice recorded from microphone is not recognized. And "goforward.raw" sound played by my mobile phone and then captured by microphone is not recognized too.
    I have verified that capturing settings are ok - 16K, 16bit, mono, raw. In order to verify I have converted "goforward.raw" and "recording.raw" to WAV using same SOX settings.
    Files definitely sound different, but I don't understand what exactly is a problem and how to fix it.

    Files are here:
    https://dl.dropboxusercontent.com/u/495486/cmusphinx/goforward.wav
    https://dl.dropboxusercontent.com/u/495486/cmusphinx/recorded.wav

     
    • Nickolay V. Shmyrev

      You need to provide the code (exact configuration and models) you are using

      If you are on Android, in decoder setup you need to uncomment setRawLogDir, it will store raw files on the sdcard as provided in the logs, you need to share them.

      You need to add -rawlogdir <dir> option to pocketsphinx config to store raw files on desktop.

       

      Last edit: Nickolay V. Shmyrev 2016-01-25
  • kriomant

    kriomant - 2016-01-25

    Ok, here are my files: https://dl.dropboxusercontent.com/u/495486/cmusphinx/conf.zip
    Here is log from stdout: https://dl.dropboxusercontent.com/u/495486/cmusphinx/log.txt
    Here is raw file from log directory: https://dl.dropboxusercontent.com/u/495486/cmusphinx/000000000.raw

    Code is written in Rust, because I'm writing Rust bindings for PocketSphinx right now. Here is relevant snippet:

    const SAMPLE_RATE: f64 = 16_000.0;
    const CHANNELS: i32 = 1;
    let ps_config = pocketsphinx::CmdLn::init(true, &["pocketsphinx",
        "-hmm", "data/cmusphinx-en-us-5.2",
        "-lm", "data/cmusphinx-5.0-en-us.lm",
        "-dict", "data/turtle.dic",
        "-samprate", &format!("{}", SAMPLE_RATE),
        "-rawlogdir", "log",
        ]).unwrap();
    let ps_decoder = pocketsphinx::PsDecoder::init(ps_config);
    ps_decoder.start_utt(None).expect("can't start recognition");
    // read raw file content into 'd'
    ps_decoder.process_raw(d, false, false).expect("can't process sound");
    ps_decoder.end_utt().unwrap();
    match ps_decoder.get_hyp() {
        None => {
            println!("No idea what you have just said :-(");
        },
        Some((hyp, _utt_id, _score)) => {
            println!("You have said: {}", hyp);
        },
    }
    
     
    • Nickolay V. Shmyrev

      The audio recorded by microphone is very limited in frequency bands, what kind of microphone do you use exactly?

      You will not be able to recognize it with standard acoustic model, you will have to use 8khz acoustic model available in downloads, though it will still require an adaptation for your audio channel.

       
  • kriomant

    kriomant - 2016-01-25

    What kind of adaptation?

     
  • kriomant

    kriomant - 2016-01-25

    It's laptop built-in microphone. Will try to find external one.

     
    • Nickolay V. Shmyrev

      Ok, let me know how it goes.

      It is also worth to note that it is better to restrict the language model, not the dictionary.

       

Log in to post a comment.