Hi,
i used ti46 alphabet data corpus (NIST format),convert it to raw format using sox. from the sox, i got raw data but in little endian format. in order to make it in big endian, i used goldwave.
when i want to train (using cygwin - full install) those data, i always got stuck in init_gau, it can't read the feature files. i've checked those files in cepview and it all can be read successfully. can anyone tell me what could be the mistakes?any help will be appreciated,thanks in advance.
[Switch] [Default] [Value]
-help no no
-example no no
-moddeffn
-ts2cbfn
-accumdir bwaccumdir
-meanfn
-ctlfn etc/alpha.fileids
-nskip
-runlen
-part 1
-npart 1
-lsnfn
-dictfn
-fdictfn
-segdir
-segext v8_seg v8_seg
-scaleseg no no
-cepdir feat
-cepext mfc feat
-silcomp none none
-cmn current current
-varnorm no no
-agc max max
-feat c/0..L-1/d/0..L-1/dd/0..L-1/
-ceplen 13 13
INFO: corpus.c(1230): Will process all remaining utts starting at 0
INFO: init_gau.c(144): Computing 1x1x1 mean estimates
.feat) failedat/0AF1SET0
ERROR: "corpus.c", line 1507: MFCC read failed. Retrying after sleep...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The proper format (little vs. big endian) of the data is dictated by the machine you are doingthe training on. Since you are using an Intel processor, the data should be left as little endian.
i have read that message, thanks. the given explanation for the error is:
"This happens when data are byte-swapped or there are very few frames in utterance. It also happens when your feature file is physically not present or is inaccessible/unreadable due to some reason"
i have ran the wave2feat in many times with the data were left as little endian but the same error still occurred. so, i believed it wasn't the byte-swapped issue. but, i don't know to determine the numbers of frame which i got in each utterance can be considered as too few or quite enough. lastly, for the last reason in that explanation, i don't think the feature files can't be reached because the cepview can read it as usual.
this is one of my command line for wave2feat, this also got error:
$ bin/wave2feat -verbose -c etc/alpha.fileids -nist -di e:/wav/alpha_train -ei wav -do feat -eo feat -srate 12500 -nfilt 32 -lowerf 150 -upperf 5500 -ncep 13
can you detect any mistake other than given explanation from that message?
thanks in advance :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I suspect this may be a bug in init_gau. Before we can say that, please make sure the following few things.
The physical wave file name is specified by three parameters,
1, the -ctlfn, this contains the root of the file name without extension.
2, the -cepdir, this contains the directory you put the all the mfcc files.
3, the -cepext, this contain the extension name.
So from your command line, your file name for feature is
./feat/dat/0AF1SET0.feat
Please confirm that whether I am correct. If I were correct, then message you gave us show that the file name manipulation were wrong. Please kindly send us
a bug report. I will fix it ASAP.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
my feature files located at 'c:/abc/feat/'.
i have specified the three parameters:
1. -ctlfn etc/alpha.fileids
0AF1SET0
0AF1SET1
0AF1SET2
0AF1SET3
0AF1SET4
0AF1SET5...
2. -cepdir feat
3. -cepext feat
/feat/0AF1SET0.feat
so, you were right about the file name except there is no folder 'dat' in directory 'feat'.
can you detect what could be the mistakes? just to mention you, at my place cvs update can not be done.
thanks for your time.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
One more suggestion, before we declare this as a bug.
Try to use .feat instead of feat in you -cepdir argument.
If you want to use absolute path, use /cygdrive/c/abc/feat/ . That is the last thing I suspect it might be wrong.
Now if you cannot get it right, then we need to fix it for you. Please file a bug report in this page. Send us
1, your command-line argument as a form of shell scripts.
2, one cepstral feature file. Please only one
3, your control file. i.e. etc/alpha.fields
I will try to fix it asap.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have followed the suggestion in -cepdir argument, 1st i try '.feat' and then the full path '/cygdrive/c/abc/feat' but both were didnt worked,i still got the same error.
about sending the files (cepstral and control), i don't know how to send or paste them into this forum, especially the cepstral file is in the binary format.how about i send them to your e-mail address?is that ok?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks Daniel, I will fix it asap. Next time, when you have a problem, go the "Bugs" page and submit a new bug. Sometimes, the developers don't have time to handle your request immediately. We will just assign to someone and fullfil your request later. You can also submit file in that page.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
before this i set default textfile type to unix in cygwin installation, and then when i re-install a fresh copy and set textfile type to DOS, the init_gau process (read MFCC files) went OK!
why this happens? which one is the correct textfile type in order to use sphinxtrain in windows?
thanks in advance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If changing the text file type from Unix to DOS allowed you to read the MFCC files, it means your MFCC files have extra CR characters in them that really shouldn't be there. This must have happened because the MFCC files were created in DOS text mode. I wouldn't expect this to happen. But if a program doesn't specify that a file is binary (and the Sphinx programs don't seem to) and you have chosen DOS text mode, cygwin has to guess how to handle it. Usually, it gets it right. Sometimes it gets it wrong.
The choice of DOS vs Unix text mode is really dictated by the user's preferences for inter-operability, and not by Sphinx. You'll get the most consistent results by specifying Unix text mode, but that may cause you problems if you want to use normal Windows tools (like notepad) to edit your Sphinx data files.
It helps to be consistent. If you decide to switch modes, you should probably regenerate your feature files.
BTW, Sphinx developers, it would help those of us who suffer under Windows ;-) if fopen calls specify binary mode where appropriate.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Roger and Daniel, in bash, there is a command called unix2dos and there is another command call dos2unix. It won't be appropiate to ask a single program to take care of all text processing or take care of problems which other programs had already taken care. This is generally not the philosophy of Unix users and programmers.
Also what if control-M means something else to random Guy A? Then, giving a specific handling of ^M will screw him up. So, in general, I think this is something we will put to the users.
Looking at the bright side, you guys know two more Unix commands and I can eliminate one bug in my list.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Roger and Arthur, before this I used cygwin with unix text mode. then I generated MFCC files. but, I stuck at init_gau because MFCC can be read. is that means those MFCC files still have CR even they were generated in cygwin with the unix text mode?
So, if I turn back to unix text mode which can give more consistent result, what should I do to avoid the problem like in init_gau happen again? is it use dos2unix to convert the MFCC files to unix format?
Roger, besides using cygwin with the unix text mode, what else should I use together to prevent read failure in text file?
thanks in advance...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I don't know a set of rules that is guaranteed to keep you out of trouble. Sometimes you just have to bite the bullet and examine a file with a hex editor to see what is going on. You need to accept the fact that in its current stage of development, Sphinx is most appropriate for experienced programmers, not "typical" users.
However, if you use Cygwin in Unix text mode and edit your text files only with a text editor that comes with Cygwin, you shouldn't run into any problems due to inserted CRs. But then, I don't know that inserted CRs are the cause of all of your problems.
And I wouldn't recommend you try to "fix up" files with unix2dos or dos2unix if you aren't sure of what you are doing. Running either of these on a binary file (such as an MFC file) will likely just give you a corrupted output file.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What else do you need to avoid these kinds of problem?
You need to be careful about these stuffs. Using Unix in different platforms is something need to be very careful. If you don't know this time, I hope you can learn it.
About you fixing up the MFC file. I agree with what Roger said. My impression is that you obviously did't know what you are doing. The "^M problem " is caused by the fact the text file in Windows has a different character return as in Unix. So this is a general problem for text file only. If you fix your MFC file, everything that has the value of ^M will be totally screw up!
As you have my knowledge about this kind of stuffs, you won't be that easy to fall into this kind of things.
BTW, please start another thread in next discussion.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
i used ti46 alphabet data corpus (NIST format),convert it to raw format using sox. from the sox, i got raw data but in little endian format. in order to make it in big endian, i used goldwave.
when i want to train (using cygwin - full install) those data, i always got stuck in init_gau, it can't read the feature files. i've checked those files in cepview and it all can be read successfully. can anyone tell me what could be the mistakes?any help will be appreciated,thanks in advance.
$ bin/init_gau -accumdir bwaccumdir -ctlfn etc/alpha.fileids -part 1 -npart 1 -cepdir feat -cepext feat -feat c/0..L-1/
d/0..L-1/dd/0..L-1/ -ceplen 13
bin/init_gau \ -accumdir bwaccumdir \ -ctlfn etc/alpha.fileids \ -part 1 \ -npart 1 \ -cepdir feat \ -cepext feat \ -feat c/0..L-1/d/0..L-1/dd/0..L-1/ \ -ceplen 13
[Switch] [Default] [Value]
-help no no
-example no no
-moddeffn
-ts2cbfn
-accumdir bwaccumdir
-meanfn
-ctlfn etc/alpha.fileids
-nskip
-runlen
-part 1
-npart 1
-lsnfn
-dictfn
-fdictfn
-segdir
-segext v8_seg v8_seg
-scaleseg no no
-cepdir feat
-cepext mfc feat
-silcomp none none
-cmn current current
-varnorm no no
-agc max max
-feat c/0..L-1/d/0..L-1/dd/0..L-1/
-ceplen 13 13
INFO: corpus.c(1230): Will process all remaining utts starting at 0
INFO: init_gau.c(144): Computing 1x1x1 mean estimates
.feat) failedat/0AF1SET0
ERROR: "corpus.c", line 1507: MFCC read failed. Retrying after sleep...
The proper format (little vs. big endian) of the data is dictated by the machine you are doingthe training on. Since you are using an Intel processor, the data should be left as little endian.
I'm not sure that this is the cause of the read failure, but the explanation of the error message at http://www-2.cs.cmu.edu/~rsingh/sphinxman/logfiles.html#098
suggests that it is.
Roger
thanks for the respon :)
i have read that message, thanks. the given explanation for the error is:
"This happens when data are byte-swapped or there are very few frames in utterance. It also happens when your feature file is physically not present or is inaccessible/unreadable due to some reason"
i have ran the wave2feat in many times with the data were left as little endian but the same error still occurred. so, i believed it wasn't the byte-swapped issue. but, i don't know to determine the numbers of frame which i got in each utterance can be considered as too few or quite enough. lastly, for the last reason in that explanation, i don't think the feature files can't be reached because the cepview can read it as usual.
this is one of my command line for wave2feat, this also got error:
$ bin/wave2feat -verbose -c etc/alpha.fileids -nist -di e:/wav/alpha_train -ei wav -do feat -eo feat -srate 12500 -nfilt 32 -lowerf 150 -upperf 5500 -ncep 13
can you detect any mistake other than given explanation from that message?
thanks in advance :)
I suspect this may be a bug in init_gau. Before we can say that, please make sure the following few things.
The physical wave file name is specified by three parameters,
1, the -ctlfn, this contains the root of the file name without extension.
2, the -cepdir, this contains the directory you put the all the mfcc files.
3, the -cepext, this contain the extension name.
So from your command line, your file name for feature is
./feat/dat/0AF1SET0.feat
Please confirm that whether I am correct. If I were correct, then message you gave us show that the file name manipulation were wrong. Please kindly send us
a bug report. I will fix it ASAP.
Arthur
thanks for the respon :)
my feature files located at 'c:/abc/feat/'.
i have specified the three parameters:
1. -ctlfn etc/alpha.fileids
0AF1SET0
0AF1SET1
0AF1SET2
0AF1SET3
0AF1SET4
0AF1SET5...
2. -cepdir feat
3. -cepext feat
/feat/0AF1SET0.feat
so, you were right about the file name except there is no folder 'dat' in directory 'feat'.
can you detect what could be the mistakes? just to mention you, at my place cvs update can not be done.
thanks for your time.
One more suggestion, before we declare this as a bug.
Try to use .feat instead of feat in you -cepdir argument.
If you want to use absolute path, use /cygdrive/c/abc/feat/ . That is the last thing I suspect it might be wrong.
Now if you cannot get it right, then we need to fix it for you. Please file a bug report in this page. Send us
1, your command-line argument as a form of shell scripts.
2, one cepstral feature file. Please only one
3, your control file. i.e. etc/alpha.fields
I will try to fix it asap.
Arthur
I have followed the suggestion in -cepdir argument, 1st i try '.feat' and then the full path '/cygdrive/c/abc/feat' but both were didnt worked,i still got the same error.
about sending the files (cepstral and control), i don't know how to send or paste them into this forum, especially the cepstral file is in the binary format.how about i send them to your e-mail address?is that ok?
Thanks Daniel, I will fix it asap. Next time, when you have a problem, go the "Bugs" page and submit a new bug. Sometimes, the developers don't have time to handle your request immediately. We will just assign to someone and fullfil your request later. You can also submit file in that page.
Arthur
I finally found the way out to this problem..
before this i set default textfile type to unix in cygwin installation, and then when i re-install a fresh copy and set textfile type to DOS, the init_gau process (read MFCC files) went OK!
why this happens? which one is the correct textfile type in order to use sphinxtrain in windows?
thanks in advance.
If changing the text file type from Unix to DOS allowed you to read the MFCC files, it means your MFCC files have extra CR characters in them that really shouldn't be there. This must have happened because the MFCC files were created in DOS text mode. I wouldn't expect this to happen. But if a program doesn't specify that a file is binary (and the Sphinx programs don't seem to) and you have chosen DOS text mode, cygwin has to guess how to handle it. Usually, it gets it right. Sometimes it gets it wrong.
The choice of DOS vs Unix text mode is really dictated by the user's preferences for inter-operability, and not by Sphinx. You'll get the most consistent results by specifying Unix text mode, but that may cause you problems if you want to use normal Windows tools (like notepad) to edit your Sphinx data files.
It helps to be consistent. If you decide to switch modes, you should probably regenerate your feature files.
BTW, Sphinx developers, it would help those of us who suffer under Windows ;-) if fopen calls specify binary mode where appropriate.
Roger and Daniel, in bash, there is a command called unix2dos and there is another command call dos2unix. It won't be appropiate to ask a single program to take care of all text processing or take care of problems which other programs had already taken care. This is generally not the philosophy of Unix users and programmers.
Also what if control-M means something else to random Guy A? Then, giving a specific handling of ^M will screw him up. So, in general, I think this is something we will put to the users.
Looking at the bright side, you guys know two more Unix commands and I can eliminate one bug in my list.
Arthur
Roger and Arthur, before this I used cygwin with unix text mode. then I generated MFCC files. but, I stuck at init_gau because MFCC can be read. is that means those MFCC files still have CR even they were generated in cygwin with the unix text mode?
So, if I turn back to unix text mode which can give more consistent result, what should I do to avoid the problem like in init_gau happen again? is it use dos2unix to convert the MFCC files to unix format?
Roger, besides using cygwin with the unix text mode, what else should I use together to prevent read failure in text file?
thanks in advance...
Daniel,
I don't know a set of rules that is guaranteed to keep you out of trouble. Sometimes you just have to bite the bullet and examine a file with a hex editor to see what is going on. You need to accept the fact that in its current stage of development, Sphinx is most appropriate for experienced programmers, not "typical" users.
However, if you use Cygwin in Unix text mode and edit your text files only with a text editor that comes with Cygwin, you shouldn't run into any problems due to inserted CRs. But then, I don't know that inserted CRs are the cause of all of your problems.
And I wouldn't recommend you try to "fix up" files with unix2dos or dos2unix if you aren't sure of what you are doing. Running either of these on a binary file (such as an MFC file) will likely just give you a corrupted output file.
What else do you need to avoid these kinds of problem?
You need to be careful about these stuffs. Using Unix in different platforms is something need to be very careful. If you don't know this time, I hope you can learn it.
About you fixing up the MFC file. I agree with what Roger said. My impression is that you obviously did't know what you are doing. The "^M problem " is caused by the fact the text file in Windows has a different character return as in Unix. So this is a general problem for text file only. If you fix your MFC file, everything that has the value of ^M will be totally screw up!
As you have my knowledge about this kind of stuffs, you won't be that easy to fall into this kind of things.
BTW, please start another thread in next discussion.
Arthur