I'm trying to train via SphinxTrain a simple numbers test.
Done everything right many times over pull carriage returns and all I can
think to try.
Nothing seems to result PHASE 6 WARNINGS 'Bad line in trasnscript' seem to be
the problem.
Transcript FOLLOWS ONE FIVE THREE FOUR THREE FIVE FIVE NINE TWO SEVEN SEVEN EIGHT EIGHT ONE FIVE TWO FOUR THREE SIX THREE ONE SEVEN TWO ONE FOUR NINE SEVEN TWO FOUR NINE FOUR TWO SIX THREE SEVEN THREE SIX THREE THREE FOUR EIGHT SIX SEVEN EIGHT SIX SIX EIGHT TWO NINE TWO THREE TWO FIVE SEVEN ONE FIVE ONE FIVE THREE NINE SIX ONE FIVE ONE THREE FOUR TWO EIGHT FOUR NINE SIX FOUR FOUR FOUR SIX ONE EIGHT NINE THREE THREE FIVE SIX TWO SEVEN ONE ONE ONE ONE SEVEN ONE ONE EIGHT EIGHT ONE EIGHT SEVEN EIGHT SEVEN THREE FOUR SIX EIGHT EIGHT FOUR NINE SEVEN NINE THREE THREE SIX THREE EIGHT ONE SIX TWO FIVE EIGHT FOUR FOUR SIX THREE FIVE ONE SEVEN SEVEN THREE SIX SEVEN FOUR SIX SIX EIGHT FOUR FOUR FOUR EIGHT FIVE EIGHT EIGHT EIGHT TWO FIVE EIGHT THREE EIGHT THREE NINE ONE EIGHT FOUR NINE TWO EIGHT NINE EIGHT EIGHT FIVE FIVE SEVEN THREE
Dictionary FOLLOWS
EIGHT EY T
FIVE F AY V
FOUR F AO R
NINE N AY N
ONE W AH N
ONE(2) HH W AH N
SEVEN S EH V AH N
SIX S IH K S
THREE TH R IY
TWO T UW
Phone(tics) FOLLOWS
AH
AO
AY
EH
EY
F
HH
IH
IY
K
N
R
S
T
TH
UW
V
W
SIL
All Waves in fileids passed conversion to feat/*.mfc like so... (using wav &
mswav settings in config)
LENGTH: zu
INFO: ........\src\programs\wave2feat\wave2feat.c(785): Reading MS Wav file
C:/Library/Java/SphinxTrain-1.0/tutorial/
NumbersTest/wav/RA_89885573.wav:
INFO: ........\src\programs\wave2feat\wave2feat.c(786): 16 bit PCM data, 1
channels 426240 samples
INFO: ........\src\programs\wave2feat\wave2feat.c(787): Sampled at 16000
SYSTEM
Running on Windows Vista compiled with Visual Studio C++ 2008 Express Edition
perl command prefix required in Command Prompt.
Building for a Java application, this simple test on number recognition.
After RunAll and complaints in log below model_arcgitechture &
model_parameters folders are empty.
Please Help! I've googled for "bad line" with SphinxTrain and get zero
responses?!? .
It seems pretty broke since all the pieces look right, so this is likely a BUG
REPORT in the makings.
MODULE: 00 verify training files
O.S. is case insensitive ("A" == "a").
Phones will be treated as case insensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
phonelist file.
Found 13 words using 19 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the
dictionary
Phase 3: CTL - Check general format; utterance length (must be positive);
files exist
Phase 4: CTL - Checking number of lines in the transcript should match lines
in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
reasonable.
Total Hours Training: 0.139026495726496
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the
dictionary
Words in dictionary: 10
Words in filler dictionary: 3
WARNING: Bad line in transcript: ONE FIVE THREE FOUR THREE FIVE FIVE NINE
WARNING: Bad line in transcript: TWO SEVEN SEVEN EIGHT EIGHT ONE FIVE TWO
WARNING: Bad line in transcript: FOUR THREE SIX THREE ONE SEVEN TWO ONE
WARNING: Bad line in transcript: FOUR NINE SEVEN TWO FOUR NINE FOUR TWO
WARNING: Bad line in transcript: SIX THREE SEVEN THREE SIX THREE THREE FOUR
WARNING: Bad line in transcript: EIGHT SIX SEVEN EIGHT SIX SIX EIGHT TWO
WARNING: Bad line in transcript: NINE TWO THREE TWO FIVE SEVEN ONE FIVE
WARNING: Bad line in transcript: ONE FIVE THREE NINE SIX ONE FIVE ONE
WARNING: Bad line in transcript: THREE FOUR TWO EIGHT FOUR NINE SIX FOUR
WARNING: Bad line in transcript: FOUR FOUR SIX ONE EIGHT NINE THREE THREE
WARNING: Bad line in transcript: FIVE SIX TWO SEVEN ONE ONE ONE ONE
WARNING: Bad line in transcript: SEVEN ONE ONE EIGHT EIGHT ONE EIGHT SEVEN
WARNING: Bad line in transcript: EIGHT SEVEN THREE FOUR SIX EIGHT EIGHT FOUR
WARNING: Bad line in transcript: NINE SEVEN NINE THREE THREE SIX THREE EIGHT
WARNING: Bad line in transcript: ONE SIX TWO FIVE EIGHT FOUR FOUR SIX
WARNING: Bad line in transcript: THREE FIVE ONE SEVEN SEVEN THREE SIX SEVEN
WARNING: Bad line in transcript: FOUR SIX SIX EIGHT FOUR FOUR FOUR EIGHT
WARNING: Bad line in transcript: FIVE EIGHT EIGHT EIGHT TWO FIVE EIGHT THREE
WARNING: Bad line in transcript: EIGHT THREE NINE ONE EIGHT FOUR NINE TWO
WARNING: Bad line in transcript: EIGHT NINE EIGHT EIGHT FIVE FIVE SEVEN THREE
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
the phonelist, and all phones in the pho
nelist appear at least once
WARNING: This phone (AH) occurs in the phonelist, but not in any word in the
transcription
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I see two problems vs. my own use of successful use of SphinxTrain:
1) The "S" in and is upper case. It may or may not be relevant, but I
use lower case.
2) No utterance ID follows the utterance. This is much more likely to be the
problem.
CB
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2010-03-26
This seemed to be the answer...
ONE FIVE THREE FOUR THREE FIVE FIVE NINE (RA_15343559)
TWO SEVEN SEVEN EIGHT EIGHT ONE FIVE TWO (RA_27788152)
FOUR THREE SIX THREE ONE SEVEN TWO ONE (RA_43631721)
FOUR NINE SEVEN TWO FOUR NINE FOUR TWO (RA_49724942)
SIX THREE SEVEN THREE SIX THREE THREE FOUR (RA_63736334)
EIGHT SIX SEVEN EIGHT SIX SIX EIGHT TWO (RA_86786682)
NINE TWO THREE TWO FIVE SEVEN ONE FIVE (RA_92325715)
ONE FIVE THREE NINE SIX ONE FIVE ONE (RA_15396151)
THREE FOUR TWO EIGHT FOUR NINE SIX FOUR (RA_34284964)
FOUR FOUR SIX ONE EIGHT NINE THREE THREE (RA_44618933)
FIVE SIX TWO SEVEN ONE ONE ONE ONE (RA_56271111)
SEVEN ONE ONE EIGHT EIGHT ONE EIGHT SEVEN (RA_71188187)
EIGHT SEVEN THREE FOUR SIX EIGHT EIGHT FOUR (RA_87346884)
NINE SEVEN NINE THREE THREE SIX THREE EIGHT (RA_97933638)
ONE SIX TWO FIVE EIGHT FOUR FOUR SIX (RA_16258446)
THREE FIVE ONE SEVEN SEVEN THREE SIX SEVEN (RA_35177367)
FOUR SIX SIX EIGHT FOUR FOUR FOUR EIGHT (RA_46684448)
FIVE EIGHT EIGHT EIGHT TWO FIVE EIGHT THREE (RA_58882583)
EIGHT THREE NINE ONE EIGHT FOUR NINE TWO (RA_83918492)
EIGHT NINE EIGHT EIGHT FIVE FIVE SEVEN THREE (RA_89885573)
for now might need to inset silence tags, etc.
Thanks for the timely response
lowercase S's didn't make a difference, removing them didn't either, but the
bracketed file refs help (aka utterance IDs
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2010-03-26
Okay FILTERS is all Uppercase now too... SIL
<sil> SIL
</sil> SIL
and I put this variant of TRANSCRIPTION ... ONE <sil> FIVE <sil> THREE <sil> FOUR <sil> THREE <sil> FIVE <sil> FIVE
<sil> NINE </sil></sil></sil></sil></sil></sil></sil> (RA_15343559) TWO <sil> SEVEN <sil> SEVEN <sil> EIGHT <sil> EIGHT <sil> ONE <sil> FIVE
<sil> TWO </sil></sil></sil></sil></sil></sil></sil> (RA_27788152) FOUR <sil> THREE <sil> SIX <sil> THREE <sil> ONE <sil> SEVEN <sil> TWO
<sil> ONE </sil></sil></sil></sil></sil></sil></sil> (RA_43631721) FOUR <sil> NINE <sil> SEVEN <sil> TWO <sil> FOUR <sil> NINE <sil> FOUR
<sil> TWO </sil></sil></sil></sil></sil></sil></sil> (RA_49724942) SIX <sil> THREE <sil> SEVEN <sil> THREE <sil> SIX <sil> THREE <sil> THREE
<sil> FOUR </sil></sil></sil></sil></sil></sil></sil> (RA_63736334) EIGHT <sil> SIX <sil> SEVEN <sil> EIGHT <sil> SIX <sil> SIX <sil> EIGHT
<sil> TWO </sil></sil></sil></sil></sil></sil></sil> (RA_86786682) NINE <sil> TWO <sil> THREE <sil> TWO <sil> FIVE <sil> SEVEN <sil> ONE
<sil> FIVE </sil></sil></sil></sil></sil></sil></sil> (RA_92325715) ONE <sil> FIVE <sil> THREE <sil> NINE <sil> SIX <sil> ONE <sil> FIVE <sil>
ONE </sil></sil></sil></sil></sil></sil></sil> (RA_15396151) THREE <sil> FOUR <sil> TWO <sil> EIGHT <sil> FOUR <sil> NINE <sil> SIX
<sil> FOUR </sil></sil></sil></sil></sil></sil></sil> (RA_34284964) FOUR <sil> FOUR <sil> SIX <sil> ONE <sil> EIGHT <sil> NINE <sil> THREE
<sil> THREE </sil></sil></sil></sil></sil></sil></sil> (RA_44618933) FIVE <sil> SIX <sil> TWO <sil> SEVEN <sil> ONE <sil> ONE <sil> ONE <sil>
ONE </sil></sil></sil></sil></sil></sil></sil> (RA_56271111) SEVEN <sil> ONE <sil> ONE <sil> EIGHT <sil> EIGHT <sil> ONE <sil> EIGHT
<sil> SEVEN </sil></sil></sil></sil></sil></sil></sil> (RA_71188187) EIGHT <sil> SEVEN <sil> THREE <sil> FOUR <sil> SIX <sil> EIGHT <sil> EIGHT
<sil> FOUR </sil></sil></sil></sil></sil></sil></sil> (RA_87346884) NINE <sil> SEVEN <sil> NINE <sil> THREE <sil> THREE <sil> SIX <sil> THREE
<sil> EIGHT </sil></sil></sil></sil></sil></sil></sil> (RA_97933638) ONE <sil> SIX <sil> TWO <sil> FIVE <sil> EIGHT <sil> FOUR <sil> FOUR <sil>
SIX </sil></sil></sil></sil></sil></sil></sil> (RA_16258446) THREE <sil> FIVE <sil> ONE <sil> SEVEN <sil> SEVEN <sil> THREE <sil> SIX
<sil> SEVEN </sil></sil></sil></sil></sil></sil></sil> (RA_35177367) FOUR <sil> SIX <sil> SIX <sil> EIGHT <sil> FOUR <sil> FOUR <sil> FOUR
<sil> EIGHT </sil></sil></sil></sil></sil></sil></sil> (RA_46684448) FIVE <sil> EIGHT <sil> EIGHT <sil> EIGHT <sil> TWO <sil> FIVE <sil> EIGHT
<sil> THREE </sil></sil></sil></sil></sil></sil></sil> (RA_58882583) EIGHT <sil> THREE <sil> NINE <sil> ONE <sil> EIGHT <sil> FOUR <sil> NINE
<sil> TWO </sil></sil></sil></sil></sil></sil></sil> (RA_83918492) EIGHT <sil> NINE <sil> EIGHT <sil> EIGHT <sil> FIVE <sil> FIVE <sil> SEVEN
<sil> THREE </sil></sil></sil></sil></sil></sil></sil> (RA_89885573)
This has warnings, not sure which log to check ...
MODULE: 30 Training Context Dependent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...
Phase 2: Initialization
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
the log file for details.
Phase 3: Forward-Backward
Baum welch starting for iteration: 1 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
the log file for details.
Normalization for iteration: 1
WARNING: This step had 0 ERROR messages and 3 WARNING messages. Please check
the log file for details.
Current Overall Likelihood Per Frame = 18.9800451566496
Baum welch starting for iteration: 2 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
the log file for details.
Normalization for iteration: 2
WARNING: This step had 0 ERROR messages and 3 WARNING messages. Please check
the log file for details.
Current Overall Likelihood Per Frame = 19.1991108535806
Training completed after 2 iterations
Following that I get Fatal Errors...
MODULE: 40 Build Trees
Phase 1: Cleaning up old log files...
Phase 2: Make Questions
Phase 3: Tree building
Processing each phone with each state
AH 0
AH 1
AH 2
AO 0
AO 1
AO 2
AY 0
AY 1
AY 2
EH 0
EH 1
EH 2
EY 0
EY 1
EY 2
F 0
F 1
F 2
HH 0
FATAL_ERROR: "........\src\programs\bldtree\main.c", line 771:
Initialization failed
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
the log file for details.
IH 0
IH 1
IH 2
IY 0
IY 1
IY 2
K 0
K 1
K 2
N 0
N 1
N 2
R 0
R 1
R 2
S 0
S 1
S 2
T 0
T 1
T 2
TH 0
TH 1
TH 2
UW 0
UW 1
UW 2
V 0
V 1
V 2
W 0
W 1
W 2
Skipping SIL
MODULE: 45 Prune Trees
Phase 1: Tree Pruning
FATAL: "........\src\programs\prunetree\main.c", line 167: Unable to open
C:/Library/Java/SphinxTrain-1.0/tutorial/Cr
ListNumbers/trees/NumbersTest.unpruned/HH-0.dtree for reading;
MODULE: 50 Training Context dependent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...
Phase 2: Copy CI to CD initialize
Phase 3: Forward-Backward
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
0% WARN: "........\src\libs\libio\model_def_io.c", line 436: Unable to open
C:/Library/Java/SphinxTrain-1.0/t
utorial/NumbersTest/model_architecture/NumbersTest.200.mdef for reading;
FATAL_ERROR: "........\src\programs\bw\m
ain.c", line 1054: initialization failed
Failed to start bw
Only 0 parts of 1 of Baum Welch were successfully completed
Parts 1 failed to run!
Training failed in iteration 1
Something failed: (C:/Library/Java/SphinxTrain-1.0/tutorial/NumbersTest/script
s_pl/50.cd_hmm_tied/slave_convg.pl)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2010-03-26
*.buildtree.HH.0.log in 40.buildtrees folder ends with failure in
INFO: ........\src\libs\libio\model_def_io.c(587): Model definition info:
INFO: ........\src\libs\libio\model_def_io.c(588): 46 total models defined
(19 base, 27 tri)
INFO: ........\src\libs\libio\model_def_io.c(589): 184 total states
INFO: ........\src\libs\libio\model_def_io.c(590): 138 total tied states
INFO: ........\src\libs\libio\model_def_io.c(591): 57 total tied CI states
INFO: ........\src\libs\libio\model_def_io.c(592): 19 total tied transition
matrices
INFO: ........\src\libs\libio\model_def_io.c(593): 4 max state/model
INFO: ........\src\libs\libio\model_def_io.c(594): 4 min state/model
WARNING: "........\src\programs\bldtree\main.c", line 144: No triphones
involving HH
FATAL_ERROR: "........\src\programs\bldtree\main.c", line 771:
Initialization failed
And 45.prunetree has this FATAL error (complete log)
C:\Library\Java\SphinxTrain-1.0\tutorial\NumbersTest\bin\prunetree.exe \
-itreedir C:/Library/Java/SphinxTrain-1.0/tutorial/NumbersTest/trees/NumbersTest.unpruned \
-nseno 200 \
-otreedir C:/Library/Java/SphinxTrain-1.0/tutorial/NumbersTest/trees/NumbersTest.200 \
-moddeffn C:/Library/Java/SphinxTrain-1.0/tutorial/NumbersTest/model_architecture/NumbersTest.alltriphones.mdef \
-psetfn C:/Library/Java/SphinxTrain-1.0/tutorial/NumbersTest/model_architecture/NumbersTest.tree_questions \
-minocc 0
zus
-help no zus
-example no zus
-moddeffn zus
-psetfn zus
-itreedir zus
-otreedir zus
-nseno zus
-minocc 0.0 zus
-allphones no zus
No such file or directory
INFO: ........\src\programs\prunetree\main.c(76): Reading: C:/Library/Java/
SphinxTrain-1.0/tutorial/NumbersTest/model_architecture/NumbersTest.alltriphon
es.mdef
INFO: ........\src\libs\libio\model_def_io.c(587): Model definition info:
INFO: ........\src\libs\libio\model_def_io.c(588): 182 total models defined
(19 base, 163 tri)
INFO: ........\src\libs\libio\model_def_io.c(589): 728 total states
INFO: ........\src\libs\libio\model_def_io.c(590): 546 total tied states
INFO: ........\src\libs\libio\model_def_io.c(591): 57 total tied CI states
INFO: ........\src\libs\libio\model_def_io.c(592): 19 total tied transition
matrices
INFO: ........\src\libs\libio\model_def_io.c(593): 4 max state/model
INFO: ........\src\libs\libio\model_def_io.c(594): 4 min state/model
INFO: ........\src\programs\prunetree\main.c(82): Reading: C:/Library/Java/
SphinxTrain-1.0/tutorial/NumbersTest/model_architecture/NumbersTest.tree_quest
ions
INFO: ........\src\programs\prunetree\main.c(183): AH-0 2
INFO: ........\src\programs\prunetree\main.c(183): AH-1 2
INFO: ........\src\programs\prunetree\main.c(183): AH-2 2
INFO: ........\src\programs\prunetree\main.c(183): AO-0 1
INFO: ........\src\programs\prunetree\main.c(183): AO-1 1
INFO: ........\src\programs\prunetree\main.c(183): AO-2 1
INFO: ........\src\programs\prunetree\main.c(183): AY-0 2
INFO: ........\src\programs\prunetree\main.c(183): AY-1 2
INFO: ........\src\programs\prunetree\main.c(183): AY-2 2
INFO: ........\src\programs\prunetree\main.c(183): EH-0 1
INFO: ........\src\programs\prunetree\main.c(183): EH-1 1
INFO: ........\src\programs\prunetree\main.c(183): EH-2 1
INFO: ........\src\programs\prunetree\main.c(183): EY-0 1
INFO: ........\src\programs\prunetree\main.c(183): EY-1 1
INFO: ........\src\programs\prunetree\main.c(183): EY-2 1
INFO: ........\src\programs\prunetree\main.c(183): F-0 2
INFO: ........\src\programs\prunetree\main.c(183): F-1 2
INFO: ........\src\programs\prunetree\main.c(183): F-2 2
FATAL: "........\src\programs\prunetree\main.c", line 167: Unable to open C
:/Library/Java/SphinxTrain-1.0/tutorial/NumbersTest/trees/NumbersTest.unpruned
/HH-0.dtree for reading; Thu Mar 25 18:41:50 2010
I suspect ONE(2) HH W AH N is not used/found/scanned from audio sources?!
removing it causes an error, I suppose I'll try rebuilding things without it
just to see what happens.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2010-03-26
pulling the HH from phone file as well as the dic ref for one(2) seems to have
helped.
Now the DIC looks like this...
EIGHT EY T
FIVE F AY V
FOUR F AO R
NINE N AY N
ONE W AH N
SEVEN S EH V AH N
SIX S IH K S
THREE TH R IY
TWO T UW
and the PHONE(tics) file looks like this...
AH
AO
AY
EH
EY
F
IH
IY
K
N
R
S
T
TH
UW
V
W
SIL
It shows some warnings with zero errors and then says that there where 35
errors and no warnings (that's a log line bug I suspect)
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
the log file for details.
Normalization for iteration: 1
Current Overall Likelihood Per Frame = 19.523487452046
Baum welch starting for 4 Gaussian(s), iteration: 2 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 2
Current Overall Likelihood Per Frame = 20.1936341112532
Split Gaussians, increase by 4
Current Overall Likelihood Per Frame = 20.1936341112532
Convergence Ratio = 0.0343251512237876
Baum welch starting for 8 Gaussian(s), iteration: 1 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 1
This step had 35 ERROR messages and 0 WARNING messages. Please check the log
file for details.
Current Overall Likelihood Per Frame = 20.2072810102302
Baum welch starting for 8 Gaussian(s), iteration: 2 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 2
Current Overall Likelihood Per Frame = 20.9428148976982
Split Gaussians, increase by 0
Training for 8 Gaussian(s) completed after 2 iterations
MODULE: 90 deleted interpolation
Skipped for continuous models
MODULE: 99 Convert to Sphinx2 format models
Can not create models used by Sphinx-II.
If you intend to create models to use with Sphinx-II models, please rerun
with:
$ST::CFG_HMM_TYPE = '.semi.' or
$ST::CFG_HMM_TYPE = '.cont' and $ST::CFG_FEATURE = '1s_12c_12d_3p_12dd' and
$ST::CFG_STATESPERHMM = '5'
It gives some advice for ShpinxII I don't need.
Well now what the hellp do I do??!! NO don't tell me I'll figure it out... or
be back.
Thanks and I hope follow up on my own thread helps someone else.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm trying to train via SphinxTrain a simple numbers test.
Done everything right many times over pull carriage returns and all I can
think to try.
Nothing seems to result PHASE 6 WARNINGS 'Bad line in trasnscript' seem to be
the problem.
Transcript FOLLOWS
ONE FIVE THREE FOUR THREE FIVE FIVE NINETWO SEVEN SEVEN EIGHT EIGHT ONE FIVE TWOFOUR THREE SIX THREE ONE SEVEN TWO ONEFOUR NINE SEVEN TWO FOUR NINE FOUR TWOSIX THREE SEVEN THREE SIX THREE THREE FOUREIGHT SIX SEVEN EIGHT SIX SIX EIGHT TWONINE TWO THREE TWO FIVE SEVEN ONE FIVEONE FIVE THREE NINE SIX ONE FIVE ONETHREE FOUR TWO EIGHT FOUR NINE SIX FOURFOUR FOUR SIX ONE EIGHT NINE THREE THREEFIVE SIX TWO SEVEN ONE ONE ONE ONESEVEN ONE ONE EIGHT EIGHT ONE EIGHT SEVENEIGHT SEVEN THREE FOUR SIX EIGHT EIGHT FOURNINE SEVEN NINE THREE THREE SIX THREE EIGHTONE SIX TWO FIVE EIGHT FOUR FOUR SIXTHREE FIVE ONE SEVEN SEVEN THREE SIX SEVENFOUR SIX SIX EIGHT FOUR FOUR FOUR EIGHTFIVE EIGHT EIGHT EIGHT TWO FIVE EIGHT THREEEIGHT THREE NINE ONE EIGHT FOUR NINE TWOEIGHT NINE EIGHT EIGHT FIVE FIVE SEVEN THREEDictionary FOLLOWS
EIGHT EY T
FIVE F AY V
FOUR F AO R
NINE N AY N
ONE W AH N
ONE(2) HH W AH N
SEVEN S EH V AH N
SIX S IH K S
THREE TH R IY
TWO T UW
Phone(tics) FOLLOWS
AH
AO
AY
EH
EY
F
HH
IH
IY
K
N
R
S
T
TH
UW
V
W
SIL
All Waves in fileids passed conversion to feat/*.mfc like so... (using wav &
mswav settings in config)
LENGTH: zu
INFO: ........\src\programs\wave2feat\wave2feat.c(785): Reading MS Wav file
C:/Library/Java/SphinxTrain-1.0/tutorial/
NumbersTest/wav/RA_89885573.wav:
INFO: ........\src\programs\wave2feat\wave2feat.c(786): 16 bit PCM data, 1
channels 426240 samples
INFO: ........\src\programs\wave2feat\wave2feat.c(787): Sampled at 16000
SYSTEM
Running on Windows Vista compiled with Visual Studio C++ 2008 Express Edition
perl command prefix required in Command Prompt.
Building for a Java application, this simple test on number recognition.
After RunAll and complaints in log below model_arcgitechture &
model_parameters folders are empty.
Please Help! I've googled for "bad line" with SphinxTrain and get zero
responses?!? .
It seems pretty broke since all the pieces look right, so this is likely a BUG
REPORT in the makings.
MODULE: 00 verify training files
O.S. is case insensitive ("A" == "a").
Phones will be treated as case insensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the
phonelist file.
Found 13 words using 19 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the
dictionary
Phase 3: CTL - Check general format; utterance length (must be positive);
files exist
Phase 4: CTL - Checking number of lines in the transcript should match lines
in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems
reasonable.
Total Hours Training: 0.139026495726496
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the
dictionary
Words in dictionary: 10
Words in filler dictionary: 3
WARNING: Bad line in transcript:
ONE FIVE THREE FOUR THREE FIVE FIVE NINEWARNING: Bad line in transcript:
TWO SEVEN SEVEN EIGHT EIGHT ONE FIVE TWOWARNING: Bad line in transcript:
FOUR THREE SIX THREE ONE SEVEN TWO ONEWARNING: Bad line in transcript:
FOUR NINE SEVEN TWO FOUR NINE FOUR TWOWARNING: Bad line in transcript:
SIX THREE SEVEN THREE SIX THREE THREE FOURWARNING: Bad line in transcript:
EIGHT SIX SEVEN EIGHT SIX SIX EIGHT TWOWARNING: Bad line in transcript:
NINE TWO THREE TWO FIVE SEVEN ONE FIVEWARNING: Bad line in transcript:
ONE FIVE THREE NINE SIX ONE FIVE ONEWARNING: Bad line in transcript:
THREE FOUR TWO EIGHT FOUR NINE SIX FOURWARNING: Bad line in transcript:
FOUR FOUR SIX ONE EIGHT NINE THREE THREEWARNING: Bad line in transcript:
FIVE SIX TWO SEVEN ONE ONE ONE ONEWARNING: Bad line in transcript:
SEVEN ONE ONE EIGHT EIGHT ONE EIGHT SEVENWARNING: Bad line in transcript:
EIGHT SEVEN THREE FOUR SIX EIGHT EIGHT FOURWARNING: Bad line in transcript:
NINE SEVEN NINE THREE THREE SIX THREE EIGHTWARNING: Bad line in transcript:
ONE SIX TWO FIVE EIGHT FOUR FOUR SIXWARNING: Bad line in transcript:
THREE FIVE ONE SEVEN SEVEN THREE SIX SEVENWARNING: Bad line in transcript:
FOUR SIX SIX EIGHT FOUR FOUR FOUR EIGHTWARNING: Bad line in transcript:
FIVE EIGHT EIGHT EIGHT TWO FIVE EIGHT THREEWARNING: Bad line in transcript:
EIGHT THREE NINE ONE EIGHT FOUR NINE TWOWARNING: Bad line in transcript:
EIGHT NINE EIGHT EIGHT FIVE FIVE SEVEN THREEPhase 7: TRANSCRIPT - Checking that all the phones in the transcript are in
the phonelist, and all phones in the pho
nelist appear at least once
WARNING: This phone (AH) occurs in the phonelist, but not in any word in the
transcription
I see two problems vs. my own use of successful use of SphinxTrain:
1) The "S" in
andis upper case. It may or may not be relevant, but Iuse lower case.
2) No utterance ID follows the utterance. This is much more likely to be the
problem.
CB
This seemed to be the answer...
ONE FIVE THREE FOUR THREE FIVE FIVE NINE (RA_15343559)
TWO SEVEN SEVEN EIGHT EIGHT ONE FIVE TWO (RA_27788152)
FOUR THREE SIX THREE ONE SEVEN TWO ONE (RA_43631721)
FOUR NINE SEVEN TWO FOUR NINE FOUR TWO (RA_49724942)
SIX THREE SEVEN THREE SIX THREE THREE FOUR (RA_63736334)
EIGHT SIX SEVEN EIGHT SIX SIX EIGHT TWO (RA_86786682)
NINE TWO THREE TWO FIVE SEVEN ONE FIVE (RA_92325715)
ONE FIVE THREE NINE SIX ONE FIVE ONE (RA_15396151)
THREE FOUR TWO EIGHT FOUR NINE SIX FOUR (RA_34284964)
FOUR FOUR SIX ONE EIGHT NINE THREE THREE (RA_44618933)
FIVE SIX TWO SEVEN ONE ONE ONE ONE (RA_56271111)
SEVEN ONE ONE EIGHT EIGHT ONE EIGHT SEVEN (RA_71188187)
EIGHT SEVEN THREE FOUR SIX EIGHT EIGHT FOUR (RA_87346884)
NINE SEVEN NINE THREE THREE SIX THREE EIGHT (RA_97933638)
ONE SIX TWO FIVE EIGHT FOUR FOUR SIX (RA_16258446)
THREE FIVE ONE SEVEN SEVEN THREE SIX SEVEN (RA_35177367)
FOUR SIX SIX EIGHT FOUR FOUR FOUR EIGHT (RA_46684448)
FIVE EIGHT EIGHT EIGHT TWO FIVE EIGHT THREE (RA_58882583)
EIGHT THREE NINE ONE EIGHT FOUR NINE TWO (RA_83918492)
EIGHT NINE EIGHT EIGHT FIVE FIVE SEVEN THREE (RA_89885573)
for now might need to inset silence tags, etc.
Thanks for the timely response
lowercase S's didn't make a difference, removing them didn't either, but the
bracketed file refs help (aka utterance IDs
Okay FILTERS is all Uppercase now too...
SILSIL<sil> SIL
</sil>
and I put this variant of TRANSCRIPTION ...
ONE <sil> FIVE <sil> THREE <sil> FOUR <sil> THREE <sil> FIVE <sil> FIVE(RA_15343559)<sil> NINE </sil></sil></sil></sil></sil></sil></sil>
TWO <sil> SEVEN <sil> SEVEN <sil> EIGHT <sil> EIGHT <sil> ONE <sil> FIVE(RA_27788152)<sil> TWO </sil></sil></sil></sil></sil></sil></sil>
FOUR <sil> THREE <sil> SIX <sil> THREE <sil> ONE <sil> SEVEN <sil> TWO(RA_43631721)<sil> ONE </sil></sil></sil></sil></sil></sil></sil>
FOUR <sil> NINE <sil> SEVEN <sil> TWO <sil> FOUR <sil> NINE <sil> FOUR(RA_49724942)<sil> TWO </sil></sil></sil></sil></sil></sil></sil>
SIX <sil> THREE <sil> SEVEN <sil> THREE <sil> SIX <sil> THREE <sil> THREE(RA_63736334)<sil> FOUR </sil></sil></sil></sil></sil></sil></sil>
EIGHT <sil> SIX <sil> SEVEN <sil> EIGHT <sil> SIX <sil> SIX <sil> EIGHT(RA_86786682)<sil> TWO </sil></sil></sil></sil></sil></sil></sil>
NINE <sil> TWO <sil> THREE <sil> TWO <sil> FIVE <sil> SEVEN <sil> ONE(RA_92325715)<sil> FIVE </sil></sil></sil></sil></sil></sil></sil>
ONE <sil> FIVE <sil> THREE <sil> NINE <sil> SIX <sil> ONE <sil> FIVE <sil>(RA_15396151)ONE </sil></sil></sil></sil></sil></sil></sil>
THREE <sil> FOUR <sil> TWO <sil> EIGHT <sil> FOUR <sil> NINE <sil> SIX(RA_34284964)<sil> FOUR </sil></sil></sil></sil></sil></sil></sil>
FOUR <sil> FOUR <sil> SIX <sil> ONE <sil> EIGHT <sil> NINE <sil> THREE(RA_44618933)<sil> THREE </sil></sil></sil></sil></sil></sil></sil>
FIVE <sil> SIX <sil> TWO <sil> SEVEN <sil> ONE <sil> ONE <sil> ONE <sil>(RA_56271111)ONE </sil></sil></sil></sil></sil></sil></sil>
SEVEN <sil> ONE <sil> ONE <sil> EIGHT <sil> EIGHT <sil> ONE <sil> EIGHT(RA_71188187)<sil> SEVEN </sil></sil></sil></sil></sil></sil></sil>
EIGHT <sil> SEVEN <sil> THREE <sil> FOUR <sil> SIX <sil> EIGHT <sil> EIGHT(RA_87346884)<sil> FOUR </sil></sil></sil></sil></sil></sil></sil>
NINE <sil> SEVEN <sil> NINE <sil> THREE <sil> THREE <sil> SIX <sil> THREE(RA_97933638)<sil> EIGHT </sil></sil></sil></sil></sil></sil></sil>
ONE <sil> SIX <sil> TWO <sil> FIVE <sil> EIGHT <sil> FOUR <sil> FOUR <sil>(RA_16258446)SIX </sil></sil></sil></sil></sil></sil></sil>
THREE <sil> FIVE <sil> ONE <sil> SEVEN <sil> SEVEN <sil> THREE <sil> SIX(RA_35177367)<sil> SEVEN </sil></sil></sil></sil></sil></sil></sil>
FOUR <sil> SIX <sil> SIX <sil> EIGHT <sil> FOUR <sil> FOUR <sil> FOUR(RA_46684448)<sil> EIGHT </sil></sil></sil></sil></sil></sil></sil>
FIVE <sil> EIGHT <sil> EIGHT <sil> EIGHT <sil> TWO <sil> FIVE <sil> EIGHT(RA_58882583)<sil> THREE </sil></sil></sil></sil></sil></sil></sil>
EIGHT <sil> THREE <sil> NINE <sil> ONE <sil> EIGHT <sil> FOUR <sil> NINE(RA_83918492)<sil> TWO </sil></sil></sil></sil></sil></sil></sil>
EIGHT <sil> NINE <sil> EIGHT <sil> EIGHT <sil> FIVE <sil> FIVE <sil> SEVEN(RA_89885573)<sil> THREE </sil></sil></sil></sil></sil></sil></sil>
This has warnings, not sure which log to check ...
MODULE: 30 Training Context Dependent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...
Phase 2: Initialization
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
the log file for details.
Phase 3: Forward-Backward
Baum welch starting for iteration: 1 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
the log file for details.
Normalization for iteration: 1
WARNING: This step had 0 ERROR messages and 3 WARNING messages. Please check
the log file for details.
Current Overall Likelihood Per Frame = 18.9800451566496
Baum welch starting for iteration: 2 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
the log file for details.
Normalization for iteration: 2
WARNING: This step had 0 ERROR messages and 3 WARNING messages. Please check
the log file for details.
Current Overall Likelihood Per Frame = 19.1991108535806
Training completed after 2 iterations
Following that I get Fatal Errors...
MODULE: 40 Build Trees
Phase 1: Cleaning up old log files...
Phase 2: Make Questions
Phase 3: Tree building
Processing each phone with each state
AH 0
AH 1
AH 2
AO 0
AO 1
AO 2
AY 0
AY 1
AY 2
EH 0
EH 1
EH 2
EY 0
EY 1
EY 2
F 0
F 1
F 2
HH 0
FATAL_ERROR: "........\src\programs\bldtree\main.c", line 771:
Initialization failed
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
the log file for details.
IH 0
IH 1
IH 2
IY 0
IY 1
IY 2
K 0
K 1
K 2
N 0
N 1
N 2
R 0
R 1
R 2
S 0
S 1
S 2
T 0
T 1
T 2
TH 0
TH 1
TH 2
UW 0
UW 1
UW 2
V 0
V 1
V 2
W 0
W 1
W 2
Skipping SIL
MODULE: 45 Prune Trees
Phase 1: Tree Pruning
FATAL: "........\src\programs\prunetree\main.c", line 167: Unable to open
C:/Library/Java/SphinxTrain-1.0/tutorial/Cr
ListNumbers/trees/NumbersTest.unpruned/HH-0.dtree for reading;
MODULE: 50 Training Context dependent models
Phase 1: Cleaning up directories:
accumulator...logs...qmanager...
Phase 2: Copy CI to CD initialize
Phase 3: Forward-Backward
Baum welch starting for 1 Gaussian(s), iteration: 1 (1 of 1)
0% WARN: "........\src\libs\libio\model_def_io.c", line 436: Unable to open
C:/Library/Java/SphinxTrain-1.0/t
utorial/NumbersTest/model_architecture/NumbersTest.200.mdef for reading;
FATAL_ERROR: "........\src\programs\bw\m
ain.c", line 1054: initialization failed
Failed to start bw
Only 0 parts of 1 of Baum Welch were successfully completed
Parts 1 failed to run!
Training failed in iteration 1
Something failed: (C:/Library/Java/SphinxTrain-1.0/tutorial/NumbersTest/script
s_pl/50.cd_hmm_tied/slave_convg.pl)
*.buildtree.HH.0.log in 40.buildtrees folder ends with failure in
INFO: ........\src\libs\libio\model_def_io.c(587): Model definition info:
INFO: ........\src\libs\libio\model_def_io.c(588): 46 total models defined
(19 base, 27 tri)
INFO: ........\src\libs\libio\model_def_io.c(589): 184 total states
INFO: ........\src\libs\libio\model_def_io.c(590): 138 total tied states
INFO: ........\src\libs\libio\model_def_io.c(591): 57 total tied CI states
INFO: ........\src\libs\libio\model_def_io.c(592): 19 total tied transition
matrices
INFO: ........\src\libs\libio\model_def_io.c(593): 4 max state/model
INFO: ........\src\libs\libio\model_def_io.c(594): 4 min state/model
WARNING: "........\src\programs\bldtree\main.c", line 144: No triphones
involving HH
FATAL_ERROR: "........\src\programs\bldtree\main.c", line 771:
Initialization failed
And 45.prunetree has this FATAL error (complete log)
C:\Library\Java\SphinxTrain-1.0\tutorial\NumbersTest\bin\prunetree.exe \
-itreedir C:/Library/Java/SphinxTrain-1.0/tutorial/NumbersTest/trees/NumbersTest.unpruned \
-nseno 200 \
-otreedir C:/Library/Java/SphinxTrain-1.0/tutorial/NumbersTest/trees/NumbersTest.200 \
-moddeffn C:/Library/Java/SphinxTrain-1.0/tutorial/NumbersTest/model_architecture/NumbersTest.alltriphones.mdef \
-psetfn C:/Library/Java/SphinxTrain-1.0/tutorial/NumbersTest/model_architecture/NumbersTest.tree_questions \
-minocc 0
zus
-help no zus
-example no zus
-moddeffn zus
-psetfn zus
-itreedir zus
-otreedir zus
-nseno zus
-minocc 0.0 zus
-allphones no zus
No such file or directory
INFO: ........\src\programs\prunetree\main.c(76): Reading: C:/Library/Java/
SphinxTrain-1.0/tutorial/NumbersTest/model_architecture/NumbersTest.alltriphon
es.mdef
INFO: ........\src\libs\libio\model_def_io.c(587): Model definition info:
INFO: ........\src\libs\libio\model_def_io.c(588): 182 total models defined
(19 base, 163 tri)
INFO: ........\src\libs\libio\model_def_io.c(589): 728 total states
INFO: ........\src\libs\libio\model_def_io.c(590): 546 total tied states
INFO: ........\src\libs\libio\model_def_io.c(591): 57 total tied CI states
INFO: ........\src\libs\libio\model_def_io.c(592): 19 total tied transition
matrices
INFO: ........\src\libs\libio\model_def_io.c(593): 4 max state/model
INFO: ........\src\libs\libio\model_def_io.c(594): 4 min state/model
INFO: ........\src\programs\prunetree\main.c(82): Reading: C:/Library/Java/
SphinxTrain-1.0/tutorial/NumbersTest/model_architecture/NumbersTest.tree_quest
ions
INFO: ........\src\programs\prunetree\main.c(183): AH-0 2
INFO: ........\src\programs\prunetree\main.c(183): AH-1 2
INFO: ........\src\programs\prunetree\main.c(183): AH-2 2
INFO: ........\src\programs\prunetree\main.c(183): AO-0 1
INFO: ........\src\programs\prunetree\main.c(183): AO-1 1
INFO: ........\src\programs\prunetree\main.c(183): AO-2 1
INFO: ........\src\programs\prunetree\main.c(183): AY-0 2
INFO: ........\src\programs\prunetree\main.c(183): AY-1 2
INFO: ........\src\programs\prunetree\main.c(183): AY-2 2
INFO: ........\src\programs\prunetree\main.c(183): EH-0 1
INFO: ........\src\programs\prunetree\main.c(183): EH-1 1
INFO: ........\src\programs\prunetree\main.c(183): EH-2 1
INFO: ........\src\programs\prunetree\main.c(183): EY-0 1
INFO: ........\src\programs\prunetree\main.c(183): EY-1 1
INFO: ........\src\programs\prunetree\main.c(183): EY-2 1
INFO: ........\src\programs\prunetree\main.c(183): F-0 2
INFO: ........\src\programs\prunetree\main.c(183): F-1 2
INFO: ........\src\programs\prunetree\main.c(183): F-2 2
FATAL: "........\src\programs\prunetree\main.c", line 167: Unable to open C
:/Library/Java/SphinxTrain-1.0/tutorial/NumbersTest/trees/NumbersTest.unpruned
/HH-0.dtree for reading; Thu Mar 25 18:41:50 2010
I suspect ONE(2) HH W AH N is not used/found/scanned from audio sources?!
removing it causes an error, I suppose I'll try rebuilding things without it
just to see what happens.
pulling the HH from phone file as well as the dic ref for one(2) seems to have
helped.
Now the DIC looks like this...
EIGHT EY T
FIVE F AY V
FOUR F AO R
NINE N AY N
ONE W AH N
SEVEN S EH V AH N
SIX S IH K S
THREE TH R IY
TWO T UW
and the PHONE(tics) file looks like this...
AH
AO
AY
EH
EY
F
IH
IY
K
N
R
S
T
TH
UW
V
W
SIL
It shows some warnings with zero errors and then says that there where 35
errors and no warnings (that's a log line bug I suspect)
WARNING: This step had 0 ERROR messages and 1 WARNING messages. Please check
the log file for details.
Normalization for iteration: 1
Current Overall Likelihood Per Frame = 19.523487452046
Baum welch starting for 4 Gaussian(s), iteration: 2 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 2
Current Overall Likelihood Per Frame = 20.1936341112532
Split Gaussians, increase by 4
Current Overall Likelihood Per Frame = 20.1936341112532
Convergence Ratio = 0.0343251512237876
Baum welch starting for 8 Gaussian(s), iteration: 1 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 1
This step had 35 ERROR messages and 0 WARNING messages. Please check the log
file for details.
Current Overall Likelihood Per Frame = 20.2072810102302
Baum welch starting for 8 Gaussian(s), iteration: 2 (1 of 1)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Normalization for iteration: 2
Current Overall Likelihood Per Frame = 20.9428148976982
Split Gaussians, increase by 0
Training for 8 Gaussian(s) completed after 2 iterations
MODULE: 90 deleted interpolation
Skipped for continuous models
MODULE: 99 Convert to Sphinx2 format models
Can not create models used by Sphinx-II.
If you intend to create models to use with Sphinx-II models, please rerun
with:
$ST::CFG_HMM_TYPE = '.semi.' or
$ST::CFG_HMM_TYPE = '.cont' and $ST::CFG_FEATURE = '1s_12c_12d_3p_12dd' and
$ST::CFG_STATESPERHMM = '5'
It gives some advice for ShpinxII I don't need.
Well now what the hellp do I do??!! NO don't tell me I'll figure it out... or
be back.
Thanks and I hope follow up on my own thread helps someone else.
Those confusing lines were removed in sphinxtrain trunk. Users will not see
them anymore. Thanks.