Menu

pocketsphinx stopping on long silence

Help
2010-01-11
2012-09-22
  • Chris Douglas

    Chris Douglas - 2010-01-11

    Hello,
    I am trying to use pocketsphinx to do ASR on long wav files. When pocketsphinx
    gets to 2-3 seconds of silence, it stops processing. I am using
    pocketsphinx_batch to do the processing. Another post said to use
    pocketsphinx_continuous but I haven't found a way to pass a recording to
    continuous. It works fine using a mic though.

    Is there a way to either process a wav file with continuous or a way to tell
    batch to not stop on silence?

    Thanks
    Chris

     
  • Nickolay V. Shmyrev

    / -- c-basic-offset: 4; indent-tabs-mode: nil -- /

    /* ====================================================================
     * Copyright (c) 1999-2001 Carnegie Mellon University.  All rights
     * reserved.
     *
     * Redistribution and use in source and binary forms, with or without
     * modification, are permitted provided that the following conditions
     * are met:
     *
     * 1. Redistributions of source code must retain the above copyright
     *    notice, this list of conditions and the following disclaimer. 
     *
     * 2. Redistributions in binary form must reproduce the above copyright
     *    notice, this list of conditions and the following disclaimer in
     *    the documentation and/or other materials provided with the
     *    distribution.
     *
     * This work was supported in part by funding from the Defense Advanced 
     * Research Projects Agency and the National Science Foundation of the 
     * United States of America, and the CMU Sphinx Speech Consortium.
     *
     * THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND 
     * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, 
     * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
     * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY
     * NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
     * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 
     * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 
     * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 
     * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 
     * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 
     * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
     *
     * ====================================================================
     *
     */
    /*
     * cont.c -- Continuous decoder for the files.
     */
    
    #include <stdio.h>
    #include <string.h>
    
    #include "pocketsphinx.h"
    #include "err.h"
    #include "ad.h"
    #include "cont_ad.h"
    
    static const arg_t cont_args_def[] = {
        POCKETSPHINX_OPTIONS,
        { "-infile",
          ARG_STRING,
          NULL, "Audio file." },
        CMDLN_EMPTY_OPTION
    };
    
    static ps_decoder_t *ps;
    static ad_rec_t bogus_ad = {0};
    static FILE *rawfd;
    static cmd_ln_t *config;
    
    static int32
    ad_file_read(ad_rec_t * ad, int16 * buf, int32 max)
    {
        size_t nread;
    
        nread = fread(buf, sizeof(int16), max, rawfd);
    
        return (nread > 0 ? nread : -1);
    }
    
    static void
    dump_result (int32 start)
    {
        ps_seg_t *iter = ps_seg_iter(ps, NULL);
        while (iter != NULL) {
            int32 sf, ef, pprob;
            float conf;
    
            ps_seg_frames (iter, &sf, &ef);
            pprob = ps_seg_prob (iter, NULL, NULL, NULL);
            conf = logmath_exp(ps_get_logmath(ps), pprob);
            printf ("%s %f %f %f\n", ps_seg_word (iter), (sf + start) / 100.0, (ef + start) / 100.0, conf);
            iter = ps_seg_next (iter);
        }
    }
    
    static void
    utterance_loop()
    {
        int16 adbuf[4096];
        int32 k, ts, start;
        cont_ad_t *cont;
    
        bogus_ad.sps = (int32)cmd_ln_float32_r(config, "-samprate");
        bogus_ad.bps = sizeof(int16);
    
        if ((cont = cont_ad_init(&bogus_ad, ad_file_read)) == NULL) {
            E_FATAL("Failed to initialize energy-based endpointer");
        }
    
    //    FILE *dump;
    //    dump = fopen ("out", "w");
    //    cont_ad_set_rawfp (cont, dump);
    //    cont_ad_set_logfp(cont, stdout);
    
        if (cont_ad_calib(cont) < 0)
            E_FATAL("cont_ad_calib failed\n");
        rewind (rawfd);
    
        for (;;) {
    
        while ((k = cont_ad_read(cont, adbuf, 4096)) == 0);
    
            if (k < 0) {
                E_INFO ("End of file\n");
                return;
            }
    
            if (ps_start_utt(ps, NULL) < 0)
                E_FATAL("ps_start_utt() failed\n");
    
            ps_process_raw(ps, adbuf, k, FALSE, FALSE);
    
            ts = cont->read_ts;
            start = (ts - k) / bogus_ad.sps * 100;
    
            for (;;) {
                if ((k = cont_ad_read(cont, adbuf, 4096)) < 0)
                    break;
                if (k == 0) {
                    if ((cont->read_ts - ts) > (bogus_ad.sps / 2)) {
                        break;
                    }
                } else {
                ts = cont->read_ts;
                }
                ps_process_raw(ps, adbuf, k, FALSE, FALSE);
            }
    
            ps_end_utt(ps);
    
        dump_result (start);        
        }
    
        cont_ad_close(cont);
    }
    
    int
    main(int argc, char *argv[])
    {
        config = cmd_ln_parse_r(NULL, cont_args_def, argc, argv, TRUE);
    
        if (config == NULL)
            return 1;
        ps = ps_init(config);
        if (ps == NULL)
            return 1;
    
        if (cmd_ln_str_r(config, "-infile") == NULL)
        return 1;
    
        char waveheader[44];
        rawfd = fopen(cmd_ln_str_r(config, "-infile"), "rb");
        fread(waveheader, 1, 44, rawfd);
    
        utterance_loop();
    
        ps_free(ps);
    
        return 0;
    }
    
     
  • Chris Douglas

    Chris Douglas - 2010-01-12

    Thank you, worked perfectly.

     
  • Halle

    Halle - 2010-03-17

    Hello,

    This was very useful to read, thanks for posting. I sort of have the opposite
    problem - I'm trying to get Pocketsphinx working on OS X but I can't get
    pocketsphinx_continuous to get audio directly from mic input, so I was
    wondering if I could get some pointers on how to modify the code above so that
    it would work with a wav that is still being continuously recorded at the time
    that utterance_loop() is called. i.e. basically faking mic input by routing a
    continuously-recorded wav into utterance_loop. Or is there a better way to
    accomplish my goal of analyzing an ongoing recording?

    How I'm trying to do it so far is by having utterance_loop() recur every time
    it receives an EOF from cont_ad_read(). This sort of works, but there are
    issues: cont_ad_calib() always fails on the second loop, the whole sound file
    gets re-examined from the beginning on each loop of utterance_loop(), and it
    seems very resource-intensive. I don't know if nshmyrev is still monitoring
    this topic but I'd be interested in hearing his or anyone else's thoughts on
    smart approaches for analyzing an ongoing recording efficiently.

     
  • Nickolay V. Shmyrev

    but I can't get pocketsphinx_continuous to get audio directly from mic
    input,

    It's probably better to put some more effort into this like to implement your
    own ad device.

    beginning on each loop of utterance_loop(), and it seems very resource-
    intensive

    It makes sense to add little sleep here to wait while audio is being recorded.

     
  • Halle

    Halle - 2010-03-19

    Thank you nshmyrev, the sleep helped considerably with the cycle usage.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.