CMU Sphinx / Forums / Help: pocketsphinx stopping on long silence

Chris Douglas - 2010-01-11

Hello,
I am trying to use pocketsphinx to do ASR on long wav files. When pocketsphinx
gets to 2-3 seconds of silence, it stops processing. I am using
pocketsphinx_batch to do the processing. Another post said to use
pocketsphinx_continuous but I haven't found a way to pass a recording to
continuous. It works fine using a mic though.

Is there a way to either process a wav file with continuous or a way to tell
batch to not stop on silence?

Thanks
Chris

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

/ -- c-basic-offset: 4; indent-tabs-mode: nil -- /

/* ====================================================================
 * Copyright (c) 1999-2001 Carnegie Mellon University.  All rights
 * reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer. 
 *
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in
 *    the documentation and/or other materials provided with the
 *    distribution.
 *
 * This work was supported in part by funding from the Defense Advanced 
 * Research Projects Agency and the National Science Foundation of the 
 * United States of America, and the CMU Sphinx Speech Consortium.
 *
 * THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND 
 * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, 
 * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY
 * NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 
 * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 
 * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 
 * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 
 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 
 * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 * ====================================================================
 *
 */
/*
 * cont.c -- Continuous decoder for the files.
 */

#include <stdio.h>
#include <string.h>

#include "pocketsphinx.h"
#include "err.h"
#include "ad.h"
#include "cont_ad.h"

static const arg_t cont_args_def[] = {
    POCKETSPHINX_OPTIONS,
    { "-infile",
      ARG_STRING,
      NULL, "Audio file." },
    CMDLN_EMPTY_OPTION
};

static ps_decoder_t *ps;
static ad_rec_t bogus_ad = {0};
static FILE *rawfd;
static cmd_ln_t *config;

static int32
ad_file_read(ad_rec_t * ad, int16 * buf, int32 max)
{
    size_t nread;

    nread = fread(buf, sizeof(int16), max, rawfd);

    return (nread > 0 ? nread : -1);
}

static void
dump_result (int32 start)
{
    ps_seg_t *iter = ps_seg_iter(ps, NULL);
    while (iter != NULL) {
        int32 sf, ef, pprob;
        float conf;

        ps_seg_frames (iter, &sf, &ef);
        pprob = ps_seg_prob (iter, NULL, NULL, NULL);
        conf = logmath_exp(ps_get_logmath(ps), pprob);
        printf ("%s %f %f %f\n", ps_seg_word (iter), (sf + start) / 100.0, (ef + start) / 100.0, conf);
        iter = ps_seg_next (iter);
    }
}

static void
utterance_loop()
{
    int16 adbuf[4096];
    int32 k, ts, start;
    cont_ad_t *cont;

    bogus_ad.sps = (int32)cmd_ln_float32_r(config, "-samprate");
    bogus_ad.bps = sizeof(int16);

    if ((cont = cont_ad_init(&bogus_ad, ad_file_read)) == NULL) {
        E_FATAL("Failed to initialize energy-based endpointer");
    }

//    FILE *dump;
//    dump = fopen ("out", "w");
//    cont_ad_set_rawfp (cont, dump);
//    cont_ad_set_logfp(cont, stdout);

    if (cont_ad_calib(cont) < 0)
        E_FATAL("cont_ad_calib failed\n");
    rewind (rawfd);

    for (;;) {

    while ((k = cont_ad_read(cont, adbuf, 4096)) == 0);

        if (k < 0) {
            E_INFO ("End of file\n");
            return;
        }

        if (ps_start_utt(ps, NULL) < 0)
            E_FATAL("ps_start_utt() failed\n");

        ps_process_raw(ps, adbuf, k, FALSE, FALSE);

        ts = cont->read_ts;
        start = (ts - k) / bogus_ad.sps * 100;

        for (;;) {
            if ((k = cont_ad_read(cont, adbuf, 4096)) < 0)
                break;
            if (k == 0) {
                if ((cont->read_ts - ts) > (bogus_ad.sps / 2)) {
                    break;
                }
            } else {
            ts = cont->read_ts;
            }
            ps_process_raw(ps, adbuf, k, FALSE, FALSE);
        }

        ps_end_utt(ps);

    dump_result (start);        
    }

    cont_ad_close(cont);
}

int
main(int argc, char *argv[])
{
    config = cmd_ln_parse_r(NULL, cont_args_def, argc, argv, TRUE);

    if (config == NULL)
        return 1;
    ps = ps_init(config);
    if (ps == NULL)
        return 1;

    if (cmd_ln_str_r(config, "-infile") == NULL)
    return 1;

    char waveheader[44];
    rawfd = fopen(cmd_ln_str_r(config, "-infile"), "rb");
    fread(waveheader, 1, 44, rawfd);

    utterance_loop();

    ps_free(ps);

    return 0;
}

Chris Douglas - 2010-01-12

Thank you, worked perfectly.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Halle - 2010-03-17

Hello,

This was very useful to read, thanks for posting. I sort of have the opposite
problem - I'm trying to get Pocketsphinx working on OS X but I can't get
pocketsphinx_continuous to get audio directly from mic input, so I was
wondering if I could get some pointers on how to modify the code above so that
it would work with a wav that is still being continuously recorded at the time
that utterance_loop() is called. i.e. basically faking mic input by routing a
continuously-recorded wav into utterance_loop. Or is there a better way to
accomplish my goal of analyzing an ongoing recording?

How I'm trying to do it so far is by having utterance_loop() recur every time
it receives an EOF from cont_ad_read(). This sort of works, but there are
issues: cont_ad_calib() always fails on the second loop, the whole sound file
gets re-examined from the beginning on each loop of utterance_loop(), and it
seems very resource-intensive. I don't know if nshmyrev is still monitoring
this topic but I'd be interested in hearing his or anyone else's thoughts on
smart approaches for analyzing an ongoing recording efficiently.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-03-18

but I can't get pocketsphinx_continuous to get audio directly from mic
input,

It's probably better to put some more effort into this like to implement your
own ad device.

beginning on each loop of utterance_loop(), and it seems very resource-
intensive

It makes sense to add little sleep here to wait while audio is being recorded.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Halle - 2010-03-19

Thank you nshmyrev, the sleep helped considerably with the cycle usage.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

pocketsphinx stopping on long silence

Speech Recognition Toolkit

Forums

Help

pocketsphinx stopping on long silence document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

pocketsphinx stopping on long silence