CMU Sphinx / Forums / Help: Tuning frate, and duration models...

Just sharing an interesting observation.

Building on my experiences with a couple of other recognizers (that shall
remain nameless) including one with a robust duration model and another that,
like PoxketSphinx, lacks a duration model, I tried a little experiment.

Since simple HMMs which can either hold at a state or move on to the next
state tend to match a exponentially decaying duration, but no known model of
speech production actually has an exponentially decaying duration, you
typically get better accuracy by either tweaking state transition
probabilities in a rather arbitrary (hacking) fashion, or simply deliberately
mismatching the framerate between the training data and the actual recognition
task. The second approach is much easier for tuning, since it's scriptable.

I wasn't quite prepared for the results. With a particularly ugly corpus
(distance mic, automotive task, in the rain) and grammar, my peak accuracy was
at reduced framerates. Specifically

• 83 - voxforge_en_sphinx.cd_cont_5000
• 75 - hub4wsj_sc_8k

The fact that two different models have two different minima shows how broken
the whole idea of exponential duration is, but that's a topic for another day.

I'm more used to an increase in the framerate, typically about 20%, for peak
accuracy, but that's the opposite of what I saw. Looking at
voxforge_en_sphinx.cd_cont_5000, the frame rate vs. sentence error rate was...
70 - 29.78
80 - 30.07
83 - 28.33
100 - 33.25
110 - 36.43
120 - 41.50

This could be looked at as

1) a way to get Pocketsphinx to run with a small (3-4%) decrease in word error
rate together with a 15-25% increase in speed (a win-win),

2) A single, tunable parameter that probably can be used for easy, low
overhead speaker adaptation. My test corpus has a mix of 6 talkers. I didn't
try tuning for each talker,individually. (is there a better parameter to tweak
for state duration? A transition probability bias of some sort?)

3) a bug in need of squashing.

4) an argument as to just how badly Sphinx needs a decent duration model.

5) a thesis project, or at least a piece of one.

Tuning frate, and duration models...

Speech Recognition Toolkit

Forums

Help

Tuning frate, and duration models... document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Tuning frate, and duration models...