I try to train simple acoustic model ( 3 words only)
and when I use the verify all script it returns the following log file:
MODULE: 00 verify training files (2008-04-30 02:41)
O.S. is case insensitive ("A" == "a").
Phones will be treated as case insensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file.
Found 6 words using 9 phones
passed
Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
passed
Phase 3: CTL - Check general format; utterance length (must be positive); files exist
passed
Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
passed
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
Total Hours Training: 0.00180683760683761
This is a small amount of data, no comment at this time
WARNING
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
Words in dictionary: 3
Words in filler dictionary: 3
WARNING: This word: hasan was in the transcript file, but is not in the dictionary ( hasan ). Do cases match?
WARNING: This word: hasan was in the transcript file, but is not in the dictionary ( hasan ). Do cases match?
FAILED
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
WARNING: This phone (HA) occurs in the phonelist (E:/SphinxTrain/first/etc/first.phone), but not in any word in the transcription (E:/SphinxTrain/first/etc/first_train.transcription)
WARNING: This phone (HN) occurs in the phonelist (E:/SphinxTrain/first/etc/first.phone), but not in any word in the transcription (E:/SphinxTrain/first/etc/first_train.transcription)
WARNING: This phone (SA) occurs in the phonelist (E:/SphinxTrain/first/etc/first.phone), but not in any word in the transcription (E:/SphinxTrain/first/etc/first_train.transcription)
FAILED
and my dictinary is:
hasan HA SA HN
?alii ?A LI II
?alaa ?A LA A
my phonelist (first.phone) is:
SIL
HA
SA
HN
?A
LI
II
LA
A
my first.transcription file is:
<s> hasan (hsn-01)
<s> ?alii (hsn-02)
<s> ?alaa (hsn-03)
<s> hasan (alaa-01)
<s> ?alii (alaa-02)
<s> ?alaa (alaa-03)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I try to train simple acoustic model ( 3 words only)
and when I use the verify all script it returns the following log file:
MODULE: 00 verify training files (2008-04-30 02:41)
O.S. is case insensitive ("A" == "a").
Phones will be treated as case insensitive.
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file.
Found 6 words using 9 phones
passed
Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
passed
Phase 3: CTL - Check general format; utterance length (must be positive); files exist
passed
Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
passed
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
Total Hours Training: 0.00180683760683761
This is a small amount of data, no comment at this time
WARNING
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
Words in dictionary: 3
Words in filler dictionary: 3
WARNING: This word: hasan was in the transcript file, but is not in the dictionary ( hasan ). Do cases match?
WARNING: This word: hasan was in the transcript file, but is not in the dictionary ( hasan ). Do cases match?
FAILED
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
WARNING: This phone (HA) occurs in the phonelist (E:/SphinxTrain/first/etc/first.phone), but not in any word in the transcription (E:/SphinxTrain/first/etc/first_train.transcription)
WARNING: This phone (HN) occurs in the phonelist (E:/SphinxTrain/first/etc/first.phone), but not in any word in the transcription (E:/SphinxTrain/first/etc/first_train.transcription)
WARNING: This phone (SA) occurs in the phonelist (E:/SphinxTrain/first/etc/first.phone), but not in any word in the transcription (E:/SphinxTrain/first/etc/first_train.transcription)
FAILED
and my dictinary is:
hasan HA SA HN
?alii ?A LI II
?alaa ?A LA A
my phonelist (first.phone) is:
SIL
HA
SA
HN
?A
LI
II
LA
A
my first.transcription file is:
<s> hasan (hsn-01)
<s> ?alii (hsn-02)
<s> ?alaa (hsn-03)
<s> hasan (alaa-01)
<s> ?alii (alaa-02)
<s> ?alaa (alaa-03)
Dictionary must be sorted I suppose:
hasan HA SA HN
?alii ?A LI II
?alaa ?A LA A
Transcripts should end on </s>:
<s> hasan </s> (hsn-01)
And try to avoid special symbols both in dictionary and in phones.