Here are some experiences I'm having running Sphinx2 on some SphinxTrain output. Basically, I got Sphinx2 working but it didn't recognise anything. Either I'm just not training on enough data (probable) or I'm missing something else. Any comments or advice would be appreciated.
** Training data
I built models with SphinxTrain of about 60 different people reciting wordlists (e.g. aviation alphabet). About 5 hours data in all; total vocabulary 109 words. This is a small (but well-formed) subset of the full dataset. I thought I'd run SphinxTrain and Sphinx on a small dataset just to get a feel for how it all works, and iron things out. Consequently I'm not too worried about accuracy for the moment.
** Issues
*** Just not enough data?
There were a number of tweaks necessary before Sphinx2 would run without errors, I'll report these below, but when it finally did run it didn't recognise anything. It's possible that 5 hours of data is just not enough for anything to happen. Is that the case? If so I'll go straight and run with the whole dataset.
I used a script based on sphinx2-test with the follwing changes:
- used -allphone mode as I'm not providing a language model
- changed appropriate directory locations
- removed flags that didn't seem necessary (or that I didn't understand)
From the last few lines of output (the full output is 103 lines, I can send it if required):
...
INFO: fbs_main.c(1358):
Utterance: desert
INFO: uttproc.c(897): Batchmode
INFO: uttproc.c(1088): Samples histogram (desert1_q1eta1) (4/8/16/30/32K):INFO: uttproc.c(1090): 13.3%(6283)INFO: uttproc.c(1090): 13.2%(6209)INFO: uttproc.c(1090): 25.5%(12032)INFO: uttproc.c(1090): 42.1%(19876)INFO: uttproc.c(1090): 5.9%(2791)INFO: uttproc.c(1091): ; max: 32768
1.662 = AGC MAX
INFO: uttproc.c(435):
INFO: uttproc.c(437): TOTAL Elapsed time 0.00 seconds
INFO: uttproc.c(439): TOTAL CPU time 0.00 seconds
INFO: uttproc.c(441): TOTAL Speech 0.00 seconds
*** 'map' empty
The senone mapping file time/model_parameters/time.s2models/map was empty. Is this significant?
*** Error message: phone_to_id: did not find [SILe]
Looking through the source, the phone SILe seems to be the equivalent of </s> ie end-of-utterance-silence. I was surprised SphinxTrain did not generate these as I had used the </s> tag in the transcriptions and put it in time.filler. I hacked round this by:
(a) copying the SIL* files in time/model_parameters/time.s2models/ to equivalent SILe files;
(b) adding an ID line for SILe to time.s2models/phone.
** Questions:
- Am I just not using enough data? If I did the same with say 40 hours of data would everything look a lot better?
- Am I missing some arguments from the sphinx2-continuous command-line? Is it even the wrong command to use? Is it worth writing a custom program?
Thanks for listening.
Ivan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Here are some experiences I'm having running Sphinx2 on some SphinxTrain output. Basically, I got Sphinx2 working but it didn't recognise anything. Either I'm just not training on enough data (probable) or I'm missing something else. Any comments or advice would be appreciated.
** Training data
I built models with SphinxTrain of about 60 different people reciting wordlists (e.g. aviation alphabet). About 5 hours data in all; total vocabulary 109 words. This is a small (but well-formed) subset of the full dataset. I thought I'd run SphinxTrain and Sphinx on a small dataset just to get a feel for how it all works, and iron things out. Consequently I'm not too worried about accuracy for the moment.
** Issues
*** Just not enough data?
There were a number of tweaks necessary before Sphinx2 would run without errors, I'll report these below, but when it finally did run it didn't recognise anything. It's possible that 5 hours of data is just not enough for anything to happen. Is that the case? If so I'll go straight and run with the whole dataset.
I used a script based on sphinx2-test with the follwing changes:
- used -allphone mode as I'm not providing a language model
- changed appropriate directory locations
- removed flags that didn't seem necessary (or that I didn't understand)
Here's the script I ended up with:
#!/bin/sh
S2=sphinx2-continuous
HOME=$SPHINXTRAINDIR/TIME/
HMM=${HOME}/model_parameters/time.s2models
ETC=${HOME}/etc
CTL_FILE=${ETC}/time.test.ctl
DICT_FILE=${ETC}/time.dic
$S2 -allphone TRUE \ -adcext sph \ -adcin TRUE \ -agcmax TRUE \ -bestpath TRUE \ -cbdir ${HMM} \ -ctlcount 1 \ -ctlfn ${CTL_FILE} \ -ctloffset 0 \ -dictfn ${DICT_FILE} \ -hmmdir ${HMM} \ -hmmdirlist ${HMM} \ -mapfn ${HMM}/map \ -phnfn ${HMM}/phone \ -verbose 9 \
# end
From the last few lines of output (the full output is 103 lines, I can send it if required):
...
INFO: fbs_main.c(1358):
Utterance: desert
INFO: uttproc.c(897): Batchmode
INFO: uttproc.c(1088): Samples histogram (desert1_q1eta1) (4/8/16/30/32K):INFO: uttproc.c(1090): 13.3%(6283)INFO: uttproc.c(1090): 13.2%(6209)INFO: uttproc.c(1090): 25.5%(12032)INFO: uttproc.c(1090): 42.1%(19876)INFO: uttproc.c(1090): 5.9%(2791)INFO: uttproc.c(1091): ; max: 32768
1.662 = AGC MAX
INFO: uttproc.c(435):
INFO: uttproc.c(437): TOTAL Elapsed time 0.00 seconds
INFO: uttproc.c(439): TOTAL CPU time 0.00 seconds
INFO: uttproc.c(441): TOTAL Speech 0.00 seconds
*** 'map' empty
The senone mapping file time/model_parameters/time.s2models/map was empty. Is this significant?
*** Error message: phone_to_id: did not find [SILe]
Looking through the source, the phone SILe seems to be the equivalent of </s> ie end-of-utterance-silence. I was surprised SphinxTrain did not generate these as I had used the </s> tag in the transcriptions and put it in time.filler. I hacked round this by:
(a) copying the SIL* files in time/model_parameters/time.s2models/ to equivalent SILe files;
(b) adding an ID line for SILe to time.s2models/phone.
** Questions:
- Am I just not using enough data? If I did the same with say 40 hours of data would everything look a lot better?
- Am I missing some arguments from the sphinx2-continuous command-line? Is it even the wrong command to use? Is it worth writing a custom program?
Thanks for listening.
Ivan