Menu

Control file in SphinxTrain

Help
Aenima1891
2007-06-08
2012-09-22
  • Aenima1891

    Aenima1891 - 2007-06-08

    i'm trying to do my first "training" with SphinxTrain, but i've found some problem.
    (i'm using an Intel PC Centrino CORE DUO)
    first of all I converted my wav files into raw files and then I made my feat files by the script

        perl scripts/make_feats.pl -ctl etc/draft_train.fileids
    

    but first I changed the script with this new lines

    system("bin/wave2feat -verbose yes -c \"$ctl\" -raw yes -di wav -ei raw -do \"$CFG_FEATFILES_DIR\" -eo \"$CFG_FEATFILE_EXTENSION\"");
    (maybe I must insert the option "-input_endian little" ?)

    all seems ok, but when I run the script scripts_pl/00.verify/verify_all.pl I obtain this

    O.S. is case sensitive ("A" != "a").
    Phones will be treated as case sensitive.
    MODULE: 00 verify training files
    Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file
    Found 13 words using 17 phones
    Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
    Phase 3: CTL - Check general format; utterance length (must be positive); files exist
    WARNING: CTL line does not parse correctly:

    WARNING: CTL line does not parse correctly:

    Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
    Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
                Total Hours Training: 0.0418632478632479
                This is a small amount of data, no comment at this time
    Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
        Words in dictionary: 9
        Words in filler dictionary: 2
    Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
    

    what's the ctl file?
    how can I write it?
    where can I save it?

    please help me, maybe with some example ;)

    thanks

     
    • Nickolay V. Shmyrev

      CTL is the file with the list of fileids:

      etc/draft_train.fileids

      I suppose you just have empty lines in it in the end and that is the problem. It should just list file names, nothing more. Remove emtpy lines from etc/draft_train.fileids

       
    • Aenima1891

      Aenima1891 - 2007-06-08

      O.S. is case sensitive ("A" != "a").
      Phones will be treated as case sensitive.
      MODULE: 00 verify training files
      Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file
      Found 13 words using 17 phones
      Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
      Phase 3: CTL - Check general format; utterance length (must be positive); files exist
      Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
      Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
      Total Hours Training: 0.0418632478632479
      This is a small amount of data, no comment at this time
      Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
      Words in dictionary: 9
      Words in filler dictionary: 2
      Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once

      eh eh..very simple!!! thank you very much!!...so i have not to write the draft.ctl file.

      another thing: do you think that i must insert the code "-input_endian little" in the script make_feats? is littleendian the default configuration? ('cause my pc have a littleendian architecture, is it true?) thanks again

       
      • Nickolay V. Shmyrev

        little endian is a default of course, don't worry about it

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.