Menu

Threshold parameter

Help
2011-07-05
2012-09-22
  • Osmo Salomaa

    Osmo Salomaa - 2011-07-05

    I'm the author of a subtitle editor and I'm in the process of adding speech
    recognition so that a rough first draft of subtitles -- at least start and end
    times if not text -- could be automatically generated from video. I've found
    your wiki page "Using PocketSphinx with GStreamer and Python" most helpful.

    I'm primarily interested in getting the times when someone starts speaking and
    when that speaking ends and having this done in roughly subtitle length
    pieces. I'm using gstreamer vader element signals "vader-start" and "vader-
    stop" to get those times.

    I plan to expose vader parameters "threshold" and "run-length" (with "auto-
    threshold" set to false) to users in a GUI dialog. Despite experimentation,
    I'm having trouble understanding the threshold parameter. The valid values
    range from -1 to 1. What do the negative values mean? What can I except will
    happen if I decrease/increase the value from the default of 0.0078125?

     
  • Nickolay V. Shmyrev

    What do the negative values mean?

    Negative value has no meaning. There is special -1.0/32768 value to enable
    auto threshold by means of setting threshold, but it's not public

    What can I except will happen if I decrease/increase the value from the
    default of 0.0078125?

    The threshold is value of the noise level energy. Default is 265/32768 so
    everything below 256 in volume will be counted as noise and everything upper
    is speech. You can reduce it to say 10 and then you will filter less noise and
    detect more speech. You can increase the energy to 1000 and then quiet speech
    will be counted as noise.

    The trunk version has been updated to reflect this.

     

Log in to post a comment.