Ivan Uemlianin - 2002-10-25

Here are some experiences I'm having running Sphinx2 on some SphinxTrain output.  Basically, I got Sphinx2 working but it didn't recognise anything.  Either I'm just not training on enough data (probable) or I'm missing something else.  Any comments or advice would be appreciated.

** Training data

I built models with SphinxTrain of about 60 different people reciting wordlists (e.g. aviation alphabet).  About 5 hours data in all; total vocabulary 109 words.  This is a small (but well-formed) subset of the full dataset.  I thought I'd run SphinxTrain and Sphinx on a small dataset just to get a feel for how it all works, and iron things out.  Consequently I'm not too worried about accuracy for the moment.   

** Issues

*** Just not enough data?

There were a number of tweaks necessary before Sphinx2 would run without errors, I'll report these below, but when it finally did run it didn't recognise anything.  It's possible that 5 hours of data is just not enough for anything to happen.  Is that the case?  If so I'll go straight and run with the whole dataset.

I used a script based on sphinx2-test with the follwing changes:
- used -allphone mode as I'm not providing a language model
- changed appropriate directory locations
- removed flags that didn't seem necessary (or that I didn't understand)

Here's the script I ended up with:

#!/bin/sh

S2=sphinx2-continuous

HOME=$SPHINXTRAINDIR/TIME/

HMM=${HOME}/model_parameters/time.s2models
ETC=${HOME}/etc

CTL_FILE=${ETC}/time.test.ctl
DICT_FILE=${ETC}/time.dic

$S2 -allphone TRUE                \     -adcext sph                   \     -adcin TRUE                   \     -agcmax TRUE                  \     -bestpath TRUE                \     -cbdir ${HMM}                 \     -ctlcount 1                   \     -ctlfn ${CTL_FILE}            \     -ctloffset 0                  \     -dictfn ${DICT_FILE}          \     -hmmdir ${HMM}                \     -hmmdirlist ${HMM}            \     -mapfn ${HMM}/map             \     -phnfn ${HMM}/phone           \     -verbose 9                    \

# end

From the last few lines of output (the full output is 103 lines, I can send it if required):

...
INFO: fbs_main.c(1358):
Utterance: desert
INFO: uttproc.c(897): Batchmode
INFO: uttproc.c(1088): Samples histogram (desert1_q1eta1) (4/8/16/30/32K):INFO: uttproc.c(1090):  13.3%(6283)INFO: uttproc.c(1090):  13.2%(6209)INFO: uttproc.c(1090):  25.5%(12032)INFO: uttproc.c(1090):  42.1%(19876)INFO: uttproc.c(1090):  5.9%(2791)INFO: uttproc.c(1091): ; max: 32768
1.662 = AGC MAX
INFO: uttproc.c(435):
INFO: uttproc.c(437): TOTAL Elapsed time 0.00 seconds
INFO: uttproc.c(439): TOTAL CPU time 0.00 seconds
INFO: uttproc.c(441): TOTAL Speech 0.00 seconds

*** 'map' empty

The senone mapping file time/model_parameters/time.s2models/map was empty.  Is this significant?

*** Error message:  phone_to_id: did not find [SILe]

Looking through the source, the phone SILe seems to be the equivalent of </s> ie end-of-utterance-silence.  I was surprised SphinxTrain did not generate these as I had used the </s> tag in the transcriptions and put it in time.filler.  I hacked round this by:
(a) copying the SIL* files in time/model_parameters/time.s2models/ to equivalent SILe files;
(b) adding an ID line for SILe to time.s2models/phone.

** Questions:

- Am I just not using enough data?  If I did the same with say 40 hours of data would everything look a lot better?

- Am I missing some arguments from the sphinx2-continuous command-line?  Is it even the wrong command to use?  Is it worth writing a custom program?

Thanks for listening.

Ivan