Menu

Pocketsphinx can not recognize some words from dictionary

Help
2015-10-02
2015-10-02
  • Kenan Berbic

    Kenan Berbic - 2015-10-02

    Hi

    I need help, last four weeks I am testing pocketsphinx, we want to use pocketsphinx in our project. First I tried to use pocketsphinx with JSGF grammary, with this grammar I got great results for correct words, but when I say something that isn't in our grammar file, I got some random word from our grammar. Then I tried to add some garbage in our grammar file, but that didn't help. I also search over the internet and found that pocketsphinx isn't yet implement support for out-of-vocabulary dictionary then I give up from grammary. After that I started with KWS dict that contains kws threshold for every word in dict. But this method didn't give me a good results and I also give up from this method. At the end I tried with Language model that I generated with lmtool. This method give me good results for some words, I also implement confidence score for this method, that sometime return unreal confidence result. After that I started with adoption acoustic model. I recorded 30 wav files with one word and from page I sucessful adopted acustic model, but after adoptation I got worse results, better results I got from none-adoption model.

    This is code that i use for speech recognition:

    Initialization

    cmd_ln_t config = cmd_ln_init(NULL, ps_args(), TRUE,
    "-hmm",
    hmmValue,
    "-dict", dictValue,
    "-bestpath","yes",
    "-fwdflat","yes",
    "-mllr","/home/kberbic/Desktop/PocketTest/files/mllr_matrix",
    "-samprate",
    samprateValue,
    "-nfft", nfftValue,
    "-fdict",
    fdictValue,
    NULL);

    HMM
    http://www.repository.voxforge1.org/downloads/Main/Trunk/AcousticModels/Sphinx/voxforge-en-r0_1_3.tar.gz

    Set LM file:

    ps_set_lm_file(instance->ps, name, file);

    Logic for speach recognition:

    int32 result = ps_decode_raw(instance->ps, file, -1);
    int32 n=0;
    int32 score;
    const char* hyp;
    int frame_rate = 100;

        ps_seg_t *iter = ps_seg_iter(instance->ps, NULL);
        while (iter != NULL) {
            int32 sf, ef, pprob;
            float conf;
    
            hyp = ps_seg_word(iter);
            ps_seg_frames(iter, &sf, &ef);
            pprob = ps_seg_prob(iter, NULL, NULL, NULL);
            conf = logmath_exp(ps_get_logmath(instance->ps), pprob);
    
            printf("%s %.3f %.3f %f\n", hyp, ((float)sf / frame_rate), ((float) ef / frame_rate), conf);
            iter = ps_seg_next(iter);
    

    // this code i have because i wrote plugin for NodeJS, return all values to NodeJS
    Local<Object> response = Object::New(isolate);
    response->Set(String::NewFromUtf8(isolate, "Value"), String::NewFromUtf8(isolate, hyp ? hyp :"") );
    response->Set(String::NewFromUtf8(isolate, "Score"), Number::New(isolate,conf));
    values->Set(n, response);
    n++;
    }

    This words i recorded for model adaptation:

    POCKET DEPTH (arctic_0001)
    GO BACK (arctic_0002)
    FURCATION INVOLVEMENT (arctic_0003)
    MOCOSA GINGIVAL BORDER (arctic_0004)
    SAVE (arctic_0005)
    CLOSE (arctic_0006)
    FINISH (arctic_0007)
    DISMISS (arctic_0008)
    ONE (arctic_0009)
    TWO (arctic_0010)
    THREE (arctic_0011)
    FOUR (arctic_0012)
    FIVE (arctic_0013)
    SIX (arctic_0014)
    SEVEN (arctic_0015)
    EIGHT (arctic_0016)
    NINE (arctic_0017)
    TEN (arctic_0018)
    BLEEDING (arctic_0019)
    REGISTRATION (arctic_0020)
    NEW REGISTRATION (arctic_0021)
    NEXT REGISTRATION (arctic_0022)
    PLUS (arctic_0023)
    PUS (arctic_0024)
    MINUS (arctic_0025)
    SINGLE TOOTH (arctic_0026)
    ALL TEETH (arctic_0027)
    TOOTH STABILITY (arctic_0028)
    JOURNAL (arctic_0029)
    MARGO (arctic_0030)
    HELP (arctic_0031)
    BORDER (arctic_0032)

    Will be great if someone have some idea to help me, because i am at the wall i don't know what to do next.

     
    • Nickolay V. Shmyrev

      Use keyword spotting for listening of activation keyphrase "new registration" or whatever. Once keyword is encountered switch to the grammar mode to recognize the command. Then switch back to keyword spotting mode.

      Use the models provided with pocketsphinx, they are way more accurate than voxforge model.

       
  • Kenan Berbic

    Kenan Berbic - 2015-10-02

    Hi Nickolay, thanks for the answer, i tried your solution, with your solution i got better accuracy, but for the some reason when i use kws mode many time i got wrong word, on the grammar i got correct but kws many times return wrong word.

     
    • Nickolay V. Shmyrev

      got correct but kws many times return wrong word.

      Keyword spotting just determines the presence of the keyword in audio. It can never return "wrong" word. It can have false alarms when word is missing but you get a signal about it, keyword threshold is responsible for that, you can tune threshold to minimize them.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.