CMU Sphinx / Forums / Help: Sphinx2 on SphinxTrain output

Here are some experiences I'm having running Sphinx2 on some SphinxTrain output. Basically, I got Sphinx2 working but it didn't recognise anything. Either I'm just not training on enough data (probable) or I'm missing something else. Any comments or advice would be appreciated.

** Training data

I built models with SphinxTrain of about 60 different people reciting wordlists (e.g. aviation alphabet). About 5 hours data in all; total vocabulary 109 words. This is a small (but well-formed) subset of the full dataset. I thought I'd run SphinxTrain and Sphinx on a small dataset just to get a feel for how it all works, and iron things out. Consequently I'm not too worried about accuracy for the moment.

** Issues

*** Just not enough data?

There were a number of tweaks necessary before Sphinx2 would run without errors, I'll report these below, but when it finally did run it didn't recognise anything. It's possible that 5 hours of data is just not enough for anything to happen. Is that the case? If so I'll go straight and run with the whole dataset.

I used a script based on sphinx2-test with the follwing changes:
- used -allphone mode as I'm not providing a language model
- changed appropriate directory locations
- removed flags that didn't seem necessary (or that I didn't understand)

Here's the script I ended up with:

#!/bin/sh

S2=sphinx2-continuous

HOME=$SPHINXTRAINDIR/TIME/

HMM=${HOME}/model_parameters/time.s2models
ETC=${HOME}/etc

CTL_FILE=${ETC}/time.test.ctl
DICT_FILE=${ETC}/time.dic

$S2 -allphone TRUE                \
    -adcext sph                   \
    -adcin TRUE                   \
    -agcmax TRUE                  \
    -bestpath TRUE                \
    -cbdir ${HMM}                 \
    -ctlcount 1                   \
    -ctlfn ${CTL_FILE}            \
    -ctloffset 0                  \
    -dictfn ${DICT_FILE}          \
    -hmmdir ${HMM}                \
    -hmmdirlist ${HMM}            \
    -mapfn ${HMM}/map             \
    -phnfn ${HMM}/phone           \
    -verbose 9                    \

# end

From the last few lines of output (the full output is 103 lines, I can send it if required):

...
INFO: fbs_main.c(1358):
Utterance: desert
INFO: uttproc.c(897): Batchmode
INFO: uttproc.c(1088): Samples histogram (desert1_q1eta1) (4/8/16/30/32K):INFO: uttproc.c(1090): 13.3%(6283)INFO: uttproc.c(1090): 13.2%(6209)INFO: uttproc.c(1090): 25.5%(12032)INFO: uttproc.c(1090): 42.1%(19876)INFO: uttproc.c(1090): 5.9%(2791)INFO: uttproc.c(1091): ; max: 32768
1.662 = AGC MAX
INFO: uttproc.c(435):
INFO: uttproc.c(437): TOTAL Elapsed time 0.00 seconds
INFO: uttproc.c(439): TOTAL CPU time 0.00 seconds
INFO: uttproc.c(441): TOTAL Speech 0.00 seconds

*** 'map' empty

The senone mapping file time/model_parameters/time.s2models/map was empty. Is this significant?

*** Error message: phone_to_id: did not find [SILe]

Looking through the source, the phone SILe seems to be the equivalent of </s> ie end-of-utterance-silence. I was surprised SphinxTrain did not generate these as I had used the </s> tag in the transcriptions and put it in time.filler. I hacked round this by:
(a) copying the SIL* files in time/model_parameters/time.s2models/ to equivalent SILe files;
(b) adding an ID line for SILe to time.s2models/phone.

** Questions:

- Am I just not using enough data? If I did the same with say 40 hours of data would everything look a lot better?

- Am I missing some arguments from the sphinx2-continuous command-line? Is it even the wrong command to use? Is it worth writing a custom program?

Thanks for listening.

Ivan

Sphinx2 on SphinxTrain output

Speech Recognition Toolkit

Forums

Help

Sphinx2 on SphinxTrain output document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Sphinx2 on SphinxTrain output