Menu

Tuning frate, and duration models...

Help
2011-10-17
2012-09-22
  • Joseph S. Wisniewski

    Just sharing an interesting observation.

    Building on my experiences with a couple of other recognizers (that shall
    remain nameless) including one with a robust duration model and another that,
    like PoxketSphinx, lacks a duration model, I tried a little experiment.

    Since simple HMMs which can either hold at a state or move on to the next
    state tend to match a exponentially decaying duration, but no known model of
    speech production actually has an exponentially decaying duration, you
    typically get better accuracy by either tweaking state transition
    probabilities in a rather arbitrary (hacking) fashion, or simply deliberately
    mismatching the framerate between the training data and the actual recognition
    task. The second approach is much easier for tuning, since it's scriptable.

    I wasn't quite prepared for the results. With a particularly ugly corpus
    (distance mic, automotive task, in the rain) and grammar, my peak accuracy was
    at reduced framerates. Specifically

    • 83 - voxforge_en_sphinx.cd_cont_5000
    • 75 - hub4wsj_sc_8k

    The fact that two different models have two different minima shows how broken
    the whole idea of exponential duration is, but that's a topic for another day.

    I'm more used to an increase in the framerate, typically about 20%, for peak
    accuracy, but that's the opposite of what I saw. Looking at
    voxforge_en_sphinx.cd_cont_5000, the frame rate vs. sentence error rate was...
    70 - 29.78
    80 - 30.07
    83 - 28.33
    100 - 33.25
    110 - 36.43
    120 - 41.50

    This could be looked at as

    1) a way to get Pocketsphinx to run with a small (3-4%) decrease in word error
    rate together with a 15-25% increase in speed (a win-win),

    2) A single, tunable parameter that probably can be used for easy, low
    overhead speaker adaptation. My test corpus has a mix of 6 talkers. I didn't
    try tuning for each talker,individually. (is there a better parameter to tweak
    for state duration? A transition probability bias of some sort?)

    3) a bug in need of squashing.

    4) an argument as to just how badly Sphinx needs a decent duration model.

    5) a thesis project, or at least a piece of one.

     
  • Nickolay V. Shmyrev

    Hello

    This is an interesting observation. To get feedback it's better to post it to
    cmusphinx-devel mailing list though.

     

Log in to post a comment.