Menu

[pocketsphinx] switch between kws and jsgf mode on the fly

Help
2017-09-07
2017-09-07
  • Lucian Georgescu

    Hello,

    I'm working on a project where I use pockesphinx to recognize commands in real time. This is the commands format (there are some examples):

    Casandra, turn the lights off.
    Casandra, turn the lights on.
    Casandra, change the color of the light.

    The idea of using a grammar model with garbage loop produces delays (the project runs on Raspberry PI).
    Reading on the forum topics, I understand that I should use the keyword spotting module to detect when the keyword "Casandra" is spoken and then switch to grammar mode to recognize the entire command.

    So I tried the following:
    When I start pocketsphinx_continuous I give it as arguments hmm, dict, samprate and inmic.
    In the source code, in the main function, after the line "ps = psinit (config)" I added the following lines:
    ps_set_kws (ps, "kws", "keyword.txt");
    ps_set_jsgf_file (ps, "jsgf", "grammar.gram");

    Then, at the start of the recognize_from_microphone() function, I set kws as default searcher: "ps_set_search (ps, "kws")";

    And here is how I changed the infinite for() loop:

    for (;;) {
            if ((k = ad_read(ad, adbuf, 2048)) < 0)
                E_FATAL("Failed to read audio\n");
            ps_process_raw(ps, adbuf, k, FALSE, FALSE);
        // check to see if currently collected audio is speech or not
            in_speech = ps_get_in_speech(ps);
            if (in_speech && !utt_started) {
                utt_started = TRUE;
                E_INFO("Listening...\n");
            }
            if (!in_speech && utt_started) {
                /* speech -> silence transition, time to start new utterance  */
                ps_end_utt(ps); 
                hyp = ps_get_hyp(ps, NULL);
                if (hyp != NULL) {
            printf("KWS hypothesis: %s\n", hyp);
            // if the detected keyword is casandra then we switch to the grammar search mode and we decode the buffer again
            if(strstr(hyp, "casandra")!=NULL)
            {
                if(ps_set_search(ps, "jsgf") != 0){
                    printf("ERROR: Cannot switch to jsgf mode \n");
                }else{
                    printf("Switched to jsgf mode \n");
                }
    
                printf("Mode: %s\n", ps_get_search(ps));
                hyp = ps_get_hyp(ps, NULL);
    
                if(hyp != NULL){
                    printf("ASR hypothesis: %s\n", hyp);
                }else{
                    printf("ASR hypothesis: NULL\n");
                }
    
                    if(ps_set_search(ps, "kws") != 0){
                    printf("ERROR: Cannot switch to kws mode \n");
                    }else{
                    printf("Switched to kws mode \n");
                }   
            }
            fflush(stdout);
                }
    

    Is it possible to call "ps_get_hyp()" twice, once in kws mode, and then in jsgf mode, having the same audio buffer as input?
    The code compiles, but it doesn't do what I wanted. Can you tell me if the code logic is good or not? Can you give me a hint exactly how should I change it?

    Thank you,

    Lucian Georgescu

     
    • Nickolay V. Shmyrev

      Can you tell me if the code logic is good or not?

      The logic is wrong.

      Can you give me a hint exactly how should I change it?

      You can use kws mode. Once keyphrase is recognized you can retrieve the audio buffer with ps_get_rawdata and process it again with a jsgf recognizer.

       
  • Lucian Georgescu

    Okay, thank you for the answer. I have changed in the following way. In the main() function, besides:

    ps_set_kws (ps, "kws", "keyword.txt");
    ps_set_jsgf_file (ps, "jsgf", "grammar.gram");

    I added:

    ps_set_search (ps, "kws");
    ps_set_rawdata_size (ps, 500000);

    to start by default in kws mode and to set the maximum buffer size (I put a random value, I have no idea how it should be).

    The recognize_from_microphone() function now looks like this:

    static void
    recognize_from_microphone()
    {
        ad_rec_t *ad;
        int16 adbuf[2048];
        int16 adbuf2[2048];
        int32 dim;
        uint8 utt_started, in_speech;
        int32 k;
        char *hyp;
        time_t startTime, endTime;
        double processingTime;
    
        if ((ad = ad_open_dev(cmd_ln_str_r(config, "-adcdev"),
                              (int) cmd_ln_float32_r(config,
                                                     "-samprate"))) == NULL)
            E_FATAL("Failed to open audio device\n");
        if (ad_start_rec(ad) < 0)
            E_FATAL("Failed to start recording\n");
    
        if (ps_start_utt(ps) < 0)
            E_FATAL("Failed to start utterance\n");
        utt_started = FALSE;
        E_INFO("Ready....\n");
    
        // at the begining we are in the KWS-search state
        for (;;) {
            if ((k = ad_read(ad, adbuf, 2048)) < 0)
                E_FATAL("Failed to read audio\n");
            ps_process_raw(ps, adbuf, k, FALSE, FALSE);
        // check to see if currently collected audio is speech or not
            in_speech = ps_get_in_speech(ps);
            if (in_speech && !utt_started) {
                utt_started = TRUE;
                E_INFO("Listening...\n");
            }
            if (!in_speech && utt_started) {
                /* speech -> silence transition, time to start new utterance  */
                ps_end_utt(ps); 
                hyp = ps_get_hyp(ps, NULL);
                if (hyp != NULL) {
            printf("KWS hypothesis: %s\n", hyp);
            // if the detected keyword is casandra then we switch to the grammar search mode and we decode the buffer again
            if(strstr(hyp, "casandra")!=NULL)
            {
                if(ps_set_search(ps, "jsgf") != 0){
                    printf("ERROR: Cannot switch to jsgf mode \n");
                }else{
                    printf("Switched to jsgf mode \n");
                }
    
                printf("Mode: %s\n", ps_get_search(ps));
    
                ps_get_rawdata(ps,adbuf2, dim);
                ps_start_utt(ps);
                ps_process_raw(ps, adbuf2, dim, FALSE, FALSE);
                ps_end_utt(ps);
                hyp = ps_get_hyp(ps, NULL);
    
                if(hyp != NULL){
                    printf("ASR hypothesis: %s\n", hyp);
                }else{
                    printf("ASR hypothesis: NULL\n");
                }
    
                    if(ps_set_search(ps, "kws") != 0){
                    printf("ERROR: Cannot switch to kws mode \n");
                    }else{
                    printf("Switched to kws mode \n");
                }
            }
            fflush(stdout);
                }
    
                if (ps_start_utt(ps) < 0)
                    E_FATAL("Failed to start utterance\n");
                utt_started = FALSE;
                E_INFO("Ready....\n");
            }
            sleep_msec(100);
        }
        ad_close(ad);
    }
    

    Using -rawlogdir it saves raw audio files to disk. But I don't understand if ps_get_rawdata() knows what audio files it needs to load. Should it be specified somewhere? Does it retrieves the latest raw audio saved to disk?
    Every time I pronounce a command, it prints the KWS hypothesis correctly, but the ASR hypothesis is always NULL.

    Thank you.

     
    • Nickolay V. Shmyrev

      Using -rawlogdir it saves raw audio files to disk. But I don't understand if ps_get_rawdata() knows what audio files it needs to load.

      It does not load files, rawlogdir is irrelevant. Audio is stored in memory when you set rawdata_size and retrieved from memory.

      Does it retrieves the latest raw audio saved to disk?

      No

      Every time I pronounce a command, it prints the KWS hypothesis correctly, but the ASR hypothesis is always NULL.

      Your code is incomplete, but probably your audio buffer adbuf2 is too small, it should have enough data for several seconds of audio.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.