Menu

Problem in init_gau

Help
2004-08-28
2012-09-22
  • danial ibrahim

    danial ibrahim - 2004-08-28

    Hi,
    i used ti46 alphabet data corpus (NIST format),convert it to raw format using sox. from the sox, i got raw data but in little endian format. in order to make it in big endian, i used goldwave.
    when i want to train (using cygwin - full install) those data, i always got stuck in init_gau, it can't read the feature files. i've checked those files in cepview and it all can be read successfully. can anyone tell me what could be the mistakes?any help will be appreciated,thanks in advance.

    $ bin/init_gau -accumdir bwaccumdir -ctlfn etc/alpha.fileids -part 1 -npart 1 -cepdir feat -cepext feat -feat c/0..L-1/
    d/0..L-1/dd/0..L-1/ -ceplen 13
    bin/init_gau \ -accumdir bwaccumdir \ -ctlfn etc/alpha.fileids \ -part 1 \ -npart 1 \ -cepdir feat \ -cepext feat \ -feat c/0..L-1/d/0..L-1/dd/0..L-1/ \ -ceplen 13

    [Switch]  [Default] [Value]
    -help     no        no
    -example  no        no
    -moddeffn
    -ts2cbfn
    -accumdir           bwaccumdir
    -meanfn
    -ctlfn              etc/alpha.fileids
    -nskip
    -runlen
    -part               1
    -npart              1
    -lsnfn
    -dictfn
    -fdictfn
    -segdir
    -segext   v8_seg    v8_seg
    -scaleseg no        no
    -cepdir             feat
    -cepext   mfc       feat
    -silcomp  none      none
    -cmn      current   current
    -varnorm  no        no
    -agc      max       max
    -feat               c/0..L-1/d/0..L-1/dd/0..L-1/
    -ceplen   13        13
    INFO: corpus.c(1230): Will process all remaining utts starting at 0
    INFO: init_gau.c(144): Computing 1x1x1 mean estimates
    .feat) failedat/0AF1SET0
    ERROR: "corpus.c", line 1507: MFCC read failed.  Retrying after sleep...

     
    • Roger Wellington-Oguri

      The proper format (little vs. big endian) of the data is dictated by the machine you are doingthe training on.  Since you are using an Intel processor, the data should be left as little endian.

      I'm not sure that this is the cause of the read failure, but the explanation of the error message at http://www-2.cs.cmu.edu/~rsingh/sphinxman/logfiles.html#098 
      suggests that it is.

      Roger

       
    • danial ibrahim

      danial ibrahim - 2004-08-28

      thanks for the respon :)

      i have read that message, thanks. the given explanation for the error is:
      "This happens when data are byte-swapped or there are very few frames in utterance. It also happens when your feature file is physically not present or is inaccessible/unreadable due to some reason"

      i have ran the wave2feat in many times with the data were left as little endian but the same error still occurred. so, i believed it wasn't the byte-swapped issue. but, i don't know to determine the numbers of frame which i got in each utterance can be considered as too few or quite enough. lastly, for the last reason in that explanation, i don't think the feature files can't be reached because the cepview can read it as usual.

      this is one of my command line for wave2feat, this also got error:
      $ bin/wave2feat -verbose -c etc/alpha.fileids -nist -di e:/wav/alpha_train -ei wav -do feat -eo feat -srate 12500 -nfilt 32 -lowerf 150 -upperf 5500 -ncep 13

      can you detect any mistake other than given explanation from that message?

      thanks in advance :)

       
    • The Grand Janitor

      I suspect this may be a bug in init_gau. Before we can say that, please make sure the following few things.

      The physical wave file name is specified by three parameters,
      1, the -ctlfn, this contains the root of the file name without extension.
      2, the -cepdir, this contains the directory you put the all the mfcc files.
      3, the -cepext, this contain the extension name. 

      So from your command line, your file name for feature is

      ./feat/dat/0AF1SET0.feat

      Please confirm that whether I am correct.  If I were correct, then message you gave us show that the file name manipulation were wrong.  Please kindly send us
      a bug report. I will fix it ASAP.

      Arthur

       
    • danial ibrahim

      danial ibrahim - 2004-08-29

      thanks for the respon :)

      my feature files located at 'c:/abc/feat/'.
      i have specified the three parameters:
      1. -ctlfn etc/alpha.fileids
      0AF1SET0
      0AF1SET1
      0AF1SET2
      0AF1SET3
      0AF1SET4
      0AF1SET5...

      2. -cepdir feat

      3. -cepext feat

      /feat/0AF1SET0.feat

      so, you were right about the file name except there is no folder 'dat' in directory 'feat'.
      can you detect what could be the mistakes? just to mention you, at my place cvs update can not be done.

      thanks for your time.

       
    • The Grand Janitor

      One more suggestion, before we declare this as a bug.
      Try to use .feat instead of feat in you -cepdir argument.
      If you want to use absolute path, use /cygdrive/c/abc/feat/   .  That is the last thing I suspect it might be wrong.

      Now if you cannot get it right, then we need to fix it for you.   Please file a bug report in this page.  Send us
      1, your command-line argument as a form of shell scripts.
      2, one cepstral feature file. Please only one
      3, your control file. i.e. etc/alpha.fields

      I will try to fix it asap.

      Arthur

       
    • danial ibrahim

      danial ibrahim - 2004-08-30

      I have followed the suggestion in -cepdir argument, 1st i try '.feat' and then the full path '/cygdrive/c/abc/feat' but both were didnt worked,i still got the same error.

      about sending the files (cepstral and control), i don't know how to send or paste them into this forum, especially the cepstral file is in the binary format.how about i send them to your e-mail address?is that ok?

       
    • The Grand Janitor

      Thanks Daniel, I will fix it asap. Next time, when you have a problem, go the "Bugs" page and submit a new bug. Sometimes, the developers don't have time to handle your request immediately. We will just assign to someone and fullfil your request later. You can also submit file in that page.

      Arthur

       
    • danial ibrahim

      danial ibrahim - 2004-09-02

      I finally found the way out to this problem..

      before this i set default textfile type to unix in cygwin installation, and then when i re-install a fresh copy and set textfile type to DOS, the init_gau process (read MFCC files) went OK!

      why this happens? which one is the correct textfile type in order to use sphinxtrain in windows?

      thanks in advance.

       
      • Roger Wellington-Oguri

        If changing the text file type from Unix to DOS allowed you to read the MFCC files, it means your MFCC files have extra CR characters in them that really shouldn't be there.  This must have happened because the MFCC files were created in DOS text mode.  I wouldn't expect this to happen.  But if a program doesn't specify that a file is binary (and the Sphinx programs don't seem to) and you have chosen DOS text mode, cygwin has to guess how to handle it.  Usually, it gets it right.  Sometimes it gets it wrong.

        The choice of DOS vs Unix text mode is really dictated by the user's preferences for inter-operability, and not by Sphinx.  You'll get the most consistent  results by specifying Unix text mode, but that may cause you problems if you want to use normal Windows tools (like notepad) to edit your Sphinx data files.

        It helps to be consistent.  If you decide to switch modes, you should probably regenerate your feature files.

        BTW, Sphinx developers, it would help those of us who suffer under Windows ;-) if fopen calls specify binary mode where appropriate.

         
    • The Grand Janitor

      Roger and Daniel, in bash, there is a command called unix2dos and there is another command call dos2unix.  It won't be appropiate to ask a single program to take care of all text processing or take care of problems which other programs had already taken care.   This is generally not the philosophy of Unix users and programmers.

      Also what if control-M means something else to random Guy A?  Then, giving a specific handling of ^M will screw him up.  So, in general, I think this is something we will put to the users.

      Looking at the bright side, you guys know two more Unix commands and I can eliminate one bug in my list.

      Arthur

       
    • danial ibrahim

      danial ibrahim - 2004-09-03

      Roger and Arthur, before this I used cygwin with unix text mode. then I generated MFCC files. but, I stuck at init_gau because MFCC can be read. is that means those MFCC files still have CR even they were generated in cygwin with the unix text mode?

      So, if I turn back to unix text mode which can give more consistent result, what should I do to avoid the problem like in init_gau happen again? is it use dos2unix to convert the MFCC files to unix format?

      Roger, besides using cygwin with the unix text mode, what else should I use together to prevent read failure in text file?

      thanks in advance...

       
      • Roger Wellington-Oguri

        Daniel,

        I don't know a set of rules that is guaranteed to keep you out of trouble.  Sometimes you just have to bite the bullet and examine a file with a hex editor to see what is going on.  You need to accept the fact that in its current stage of development, Sphinx is most appropriate for experienced programmers, not "typical" users.

        However, if you use Cygwin in Unix text mode and edit your text files only with a text editor that comes with Cygwin, you shouldn't run into any problems due to inserted CRs.  But then, I don't know that inserted CRs are the cause of all of your problems.

        And I wouldn't recommend you try to "fix up" files with unix2dos or dos2unix if you aren't sure of what you are doing.  Running either of these on a binary file (such as an MFC file) will likely just give you a corrupted output file.

         
    • The Grand Janitor

      What else do you need to avoid these kinds of problem?
      You need to be careful about these stuffs. Using Unix in different platforms is something need to be very careful.  If you don't know this time, I hope you can learn it.

      About you fixing up the MFC file.  I agree with what Roger said. My impression is that you obviously did't know what you are doing.   The "^M problem " is caused by the fact the text file in Windows has a different character return as in Unix.  So this is a general problem for text file only.    If you fix your MFC file, everything that has the value of ^M will be totally screw up!

      As you have my knowledge about this kind of stuffs, you won't be that easy to fall into this kind of things. 

      BTW, please start another thread in next discussion.

      Arthur

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.