i'm trying to do my first "training" with SphinxTrain, but i've found some problem.
(i'm using an Intel PC Centrino CORE DUO)
first of all I converted my wav files into raw files and then I made my feat files by the script
but first I changed the script with this new lines
system("bin/wave2feat -verbose yes -c \"$ctl\" -raw yes -di wav -ei raw -do \"$CFG_FEATFILES_DIR\" -eo \"$CFG_FEATFILE_EXTENSION\"");
(maybe I must insert the option "-input_endian little" ?)
all seems ok, but when I run the script scripts_pl/00.verify/verify_all.pl I obtain this
O.S. is case sensitive ("A" != "a").
Phones will be treated as case sensitive.
MODULE: 00 verify training files
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file
Found 13 words using 17 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
Phase 3: CTL - Check general format; utterance length (must be positive); files exist
WARNING: CTL line does not parse correctly:
WARNING: CTL line does not parse correctly:
Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
Total Hours Training: 0.0418632478632479
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
Words in dictionary: 9
Words in filler dictionary: 2
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
what's the ctl file?
how can I write it?
where can I save it?
please help me, maybe with some example ;)
thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I suppose you just have empty lines in it in the end and that is the problem. It should just list file names, nothing more. Remove emtpy lines from etc/draft_train.fileids
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
O.S. is case sensitive ("A" != "a").
Phones will be treated as case sensitive.
MODULE: 00 verify training files
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file
Found 13 words using 17 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
Phase 3: CTL - Check general format; utterance length (must be positive); files exist
Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
Total Hours Training: 0.0418632478632479
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
Words in dictionary: 9
Words in filler dictionary: 2
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
eh eh..very simple!!! thank you very much!!...so i have not to write the draft.ctl file.
another thing: do you think that i must insert the code "-input_endian little" in the script make_feats? is littleendian the default configuration? ('cause my pc have a littleendian architecture, is it true?) thanks again
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
i'm trying to do my first "training" with SphinxTrain, but i've found some problem.
(i'm using an Intel PC Centrino CORE DUO)
first of all I converted my wav files into raw files and then I made my feat files by the script
but first I changed the script with this new lines
system("bin/wave2feat -verbose yes -c \"$ctl\" -raw yes -di wav -ei raw -do \"$CFG_FEATFILES_DIR\" -eo \"$CFG_FEATFILE_EXTENSION\"");
(maybe I must insert the option "-input_endian little" ?)
all seems ok, but when I run the script scripts_pl/00.verify/verify_all.pl I obtain this
O.S. is case sensitive ("A" != "a").
Phones will be treated as case sensitive.
MODULE: 00 verify training files
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file
Found 13 words using 17 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
Phase 3: CTL - Check general format; utterance length (must be positive); files exist
WARNING: CTL line does not parse correctly:
WARNING: CTL line does not parse correctly:
what's the ctl file?
how can I write it?
where can I save it?
please help me, maybe with some example ;)
thanks
CTL is the file with the list of fileids:
etc/draft_train.fileids
I suppose you just have empty lines in it in the end and that is the problem. It should just list file names, nothing more. Remove emtpy lines from etc/draft_train.fileids
O.S. is case sensitive ("A" != "a").
Phones will be treated as case sensitive.
MODULE: 00 verify training files
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file
Found 13 words using 17 phones
Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
Phase 3: CTL - Check general format; utterance length (must be positive); files exist
Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
Total Hours Training: 0.0418632478632479
This is a small amount of data, no comment at this time
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
Words in dictionary: 9
Words in filler dictionary: 2
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
eh eh..very simple!!! thank you very much!!...so i have not to write the draft.ctl file.
another thing: do you think that i must insert the code "-input_endian little" in the script make_feats? is littleendian the default configuration? ('cause my pc have a littleendian architecture, is it true?) thanks again
little endian is a default of course, don't worry about it