Hi, I want to train an acuoustic model for spanish language. At this time, I'd like to try just with digits, once done it, i'll increase the vocabulary.
I know that word recognition is recomended for isolated word, but I prefer a phoneme recongnition training instead a word recongnition due I plan to increase the vocabulary.
I'm using Linux (Ubuntu hoary), SphinxTrain nightly build (about 05/08/2005)
Ok, after this introduction, i'll explain my problem.
I created the directory structure with $SPHINX_TRAIN/scripts_pl/setup_SphinxTrain.pl -task test
My .dic file has this information
<s> uno </s>
<s> uno </s>
...
<s> uno </s>
<s> dos </s>
<s> dos </s>
...
<s> dos </s>
<s> tres </s>
<s> tres </s>
...
<s> tres </s>
well, I execute the scripts_pl/00.verify/verify_all.pl getting this error
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
WARNING: This phone (D) occurs in the phonelist (/home/public/test/etc/test.phone), but not in any word in the transcription (/home/public/test/etc/test_train.transcription)
WARNING: This phone (E) occurs in the phonelist (/home/public/test/etc/test.phone), but not in any word in the transcription (/home/public/test/etc/test_train.transcription)
WARNING: This phone (N) occurs in the phonelist (/home/public/test/etc/test.phone), but not in any word in the transcription (/home/public/test/etc/test_train.transcription)
... (and so on for all the phones)
Does any body know why I get this warnig?
Another question...
The wave2feat from SphinxTrain and the wave2feat of Sphinx3 generate totaly diferent values, wich one is the correct? (I used the wave2feat from Sphinx3 because the almost all the values are > 1, except for one value by vector)
Thanks. And sorry for my bad English... :(
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2005-08-15
Azarel -- your transcription file format is incorrect. Each line in the transcription file must contain the utterance file name in parentheses as well as the transcription. Otherwise there would be no way for SphinxTrain to tell which transcription goes with which utterance! Your file should look like:
<s> uno </s> (uno01)
<s> uno </s> (uno02)
...
<s> uno </s> (uno99)
<s> dos </s> (dos01)
...
cheers,
jerry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
-Don't use wave2feat in sphinx3, it was not released. (Not in announcement, not installed) Please use SphinxTrain's as a training package.
However, even though it is not released, its behavior should be very similar to sphinx3. Please send us the wave file you used to test the code. Also, please check the command line argument of the two. You should use a matched set of values in both cases.
-That sounds strange for
"WARNING: This phone (D) occurs in the phonelist (/home/public/test/etc/test.phone), but not in any word in the transcription"
That is to say, verifyall.pl after look up the dictionary and found that the phone doesn't appear in any words. Do you check whether any value in ./etc/sphinx_train.cfg is correct or not?
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Arthur, thanks for answer
About the two versions of wave2feat, one of the first visible diferences are that the true/false arguments are "yes" or "not" in wave2feat from SphinxTrain but the same arguments are "1" or "0" in wave2feat from Sphinx3. The arguments must be written in this way from command line, which shows that theese are not the same program.
here is the parameters and its values for the wave2feat from
[Switch][Default][Value]
-help no no
-example no no
-i /home/public/hola.wav
-o /home/public/hola.mfcT
-c
-di
-ei
-do
-eo
-nist no no
-raw no no
-mswav no yes
-input_endian little little
-nchans 1 1
-whichchan 1 1
-logspec no no
-feat sphinx sphinx
-mach_endian little little
-alpha 0.97 9.700000e-01
-srate 16000.0 1.600000e+04
-frate 100 100
-wlen 0.0256 2.560000e-02
-nfft 512 512
-nfilt 40 40
-lowerf 133.33334 1.333333e+02
-upperf 6855.4976 6.855498e+03
-ncep 13 13
-doublebw no no
-blocksize 200000 200000
-dither no no
-verbose no no
As you can see, the values are the same, except for seed and logfn argument, which wave2feat from Sphinx doesnt have.
here are a part of the "cepview" for the mfc created by the SphinxTrain
The feature vectors are totaly diferent.
Also, at the end of the wave2feat from Sphinx3, there is a warning message:
WARNING: "fe.c", line 340: File hola.wav has some frames with zero energy. Consider using dither
but using dither doesn change the message (recursive, I use dither but the program invite me to use dither :)
In the SphinxTrain this message doesn't appear, but te values are closer to zero. (That's confusing)
-I've not changed the sphinx_train.cfg, I read it and the info is correct. I think.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
-SphinxTrains wave2feat is giving the correct result. Again, stick to it.
-Please send us the wavefile you are using. Send me direct at archan @ cs dot cmu dot edu. Though wav2feat is not released in 3.5, I think this is a bug.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Append info...
The first arguments and values are from te Sphinx3, the second is from de SphinxTrain, unfortunately HTML format does not show clear the columns...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
BTW, I still don't have enough information to debug the phone list problem. However, this is something you could
1, trace the perl script yourself.
2, give me information in parrallel of sending me the waveform.
I will work with you to see what we could do. Good luck to both of us.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I want to train an acuoustic model for spanish language. At this time, I'd like to try just with digits, once done it, i'll increase the vocabulary.
I know that word recognition is recomended for isolated word, but I prefer a phoneme recongnition training instead a word recongnition due I plan to increase the vocabulary.
I'm using Linux (Ubuntu hoary), SphinxTrain nightly build (about 05/08/2005)
Ok, after this introduction, i'll explain my problem.
I created the directory structure with $SPHINX_TRAIN/scripts_pl/setup_SphinxTrain.pl -task test
My .dic file has this information
UNO U N O
DOS D O S
TRES T R E S
my .phone file
D
E
N
O
R
S
T
U
SIL
my .filler file
<s> SIL
<sil> SIL
</s> SIL
my .filids file
uno01
uno02
...
uno99
dos01
dos02
...
dos99
tres01
tres02
...
tres99
my .transcription file
<s> uno </s>
<s> uno </s>
...
<s> uno </s>
<s> dos </s>
<s> dos </s>
...
<s> dos </s>
<s> tres </s>
<s> tres </s>
...
<s> tres </s>
well, I execute the scripts_pl/00.verify/verify_all.pl getting this error
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
WARNING: This phone (D) occurs in the phonelist (/home/public/test/etc/test.phone), but not in any word in the transcription (/home/public/test/etc/test_train.transcription)
WARNING: This phone (E) occurs in the phonelist (/home/public/test/etc/test.phone), but not in any word in the transcription (/home/public/test/etc/test_train.transcription)
WARNING: This phone (N) occurs in the phonelist (/home/public/test/etc/test.phone), but not in any word in the transcription (/home/public/test/etc/test_train.transcription)
... (and so on for all the phones)
Does any body know why I get this warnig?
Another question...
The wave2feat from SphinxTrain and the wave2feat of Sphinx3 generate totaly diferent values, wich one is the correct? (I used the wave2feat from Sphinx3 because the almost all the values are > 1, except for one value by vector)
Thanks. And sorry for my bad English... :(
Azarel -- your transcription file format is incorrect. Each line in the transcription file must contain the utterance file name in parentheses as well as the transcription. Otherwise there would be no way for SphinxTrain to tell which transcription goes with which utterance! Your file should look like:
<s> uno </s> (uno01)
<s> uno </s> (uno02)
...
<s> uno </s> (uno99)
<s> dos </s> (dos01)
...
cheers,
jerry
-Don't use wave2feat in sphinx3, it was not released. (Not in announcement, not installed) Please use SphinxTrain's as a training package.
However, even though it is not released, its behavior should be very similar to sphinx3. Please send us the wave file you used to test the code. Also, please check the command line argument of the two. You should use a matched set of values in both cases.
-That sounds strange for
"WARNING: This phone (D) occurs in the phonelist (/home/public/test/etc/test.phone), but not in any word in the transcription"
That is to say, verifyall.pl after look up the dictionary and found that the phone doesn't appear in any words. Do you check whether any value in ./etc/sphinx_train.cfg is correct or not?
Arthur
Hi Arthur, thanks for answer
About the two versions of wave2feat, one of the first visible diferences are that the true/false arguments are "yes" or "not" in wave2feat from SphinxTrain but the same arguments are "1" or "0" in wave2feat from Sphinx3. The arguments must be written in this way from command line, which shows that theese are not the same program.
here is the parameters and its values for the wave2feat from
Current configuration:
[NAME] [DEFLT] [VALUE]
-alpha 0.97 9.700000e-01
-blocksize 200000 200000
-c
-di
-dither 0 0
-do
-doublebw 0 0
-ei
-eo
-feat sphinx sphinx
-frate 100 100
-i /home/public/hola.wav
-input_endian little little
-logfn
-logspec 0 0
-lowerf 133.33334 1.333333e+02
-mach_endian little little
-mswav 0 1
-ncep 13 13
-nchans 1 1
-nfft 256 256
-nfilt 40 40
-nist 0 0
-o /home/public/hola.mfc3
-raw 0 0
-seed -1 -1
-srate 16000.0 1.600000e+04
-upperf 6855.4976 6.855498e+03
-verbose 0 0
-whichchan 1 1
-wlen 0.0256 2.560000e-02
[Switch] [Default] [Value]
-help no no
-example no no
-i /home/public/hola.wav
-o /home/public/hola.mfcT
-c
-di
-ei
-do
-eo
-nist no no
-raw no no
-mswav no yes
-input_endian little little
-nchans 1 1
-whichchan 1 1
-logspec no no
-feat sphinx sphinx
-mach_endian little little
-alpha 0.97 9.700000e-01
-srate 16000.0 1.600000e+04
-frate 100 100
-wlen 0.0256 2.560000e-02
-nfft 512 512
-nfilt 40 40
-lowerf 133.33334 1.333333e+02
-upperf 6855.4976 6.855498e+03
-ncep 13 13
-doublebw no no
-blocksize 200000 200000
-dither no no
-verbose no no
As you can see, the values are the same, except for seed and logfn argument, which wave2feat from Sphinx doesnt have.
here are a part of the "cepview" for the mfc created by the SphinxTrain
4.885 -0.398 -0.254 -0.404 -0.118 -0.041 0.161 0.120 -0.035 -0.117 -0.096 -0.029 -0.158
4.908 -0.500 -0.217 -0.337 -0.047 -0.058 0.020 0.154 -0.089 -0.070 -0.187 0.072 0.181
4.924 -0.783 -0.084 -0.285 -0.049 -0.131 0.078 0.157 0.056 0.025 -0.050 -0.140 0.110
4.946 -0.625 -0.278 -0.311 -0.043 -0.105 -0.039 0.014 -0.033 -0.048 -0.146 -0.024 0.164
4.866 -0.534 -0.261 -0.337 -0.156 -0.085 -0.036 0.109 -0.152 -0.207 -0.175 0.066 0.168
4.873 -0.500 -0.304 -0.176 -0.074 -0.034 -0.067 -0.013 -0.047 -0.059 -0.133 0.051 0.191
4.725 -0.755 -0.407 -0.293 -0.055 -0.096 0.010 0.002 0.003 -0.065 -0.031 0.073 0.130
5.062 -0.612 -0.332 -0.312 -0.129 -0.163 -0.047 0.052 0.011 0.001 0.072 0.158 0.091
5.361 -0.547 -0.245 -0.345 -0.060 -0.151 -0.052 0.041 0.041 0.050 0.163 0.075 0.021
5.118 -0.468 -0.176 -0.381 -0.054 -0.112 -0.228 -0.056 -0.066 0.031 0.036 0.014 0.095
4.818 -0.514 -0.167 -0.254 -0.103 -0.122 -0.017 0.012 -0.110 -0.055 -0.064 0.203 0.167
4.897 -0.458 -0.194 -0.360 -0.059 -0.109 0.037 -0.048 -0.068 0.029 -0.207 -0.081 0.021
4.853 -0.532 -0.187 -0.284 -0.077 -0.054 -0.091 -0.005 -0.050 -0.106 -0.018 0.030 0.083
5.028 -0.590 -0.345 -0.333 -0.147 -0.021 0.025 0.094 -0.025 -0.067 -0.032 -0.060 0.009
4.900 -0.702 -0.329 -0.244 -0.128 0.022 0.113 0.224 -0.016 -0.098 -0.022 0.056 0.001
...
here are a part of the "cepview" for the mfc created by the Sphinx3
-2495.835 -2452.391 -2309.967 -2079.092 -1767.950 -1389.011 -956.595 -487.632 -0.025 487.657 956.659 1388.964 1767.717
-2495.944 -2452.587 -2309.998 -2079.093 -1767.870 -1389.011 -956.703 -487.510 0.020 487.750 956.593 1389.061 1768.049
-2495.824 -2452.771 -2309.791 -2078.958 -1767.838 -1389.092 -956.623 -487.504 0.092 487.726 956.609 1388.772 1767.906
-2495.860 -2452.645 -2309.965 -2078.996 -1767.859 -1389.108 -956.736 -487.611 0.113 487.765 956.541 1388.914 1768.014
-2495.839 -2452.518 -2309.995 -2079.048 -1767.931 -1388.996 -956.716 -487.580 -0.121 487.581 956.578 1389.021 1767.978
-2495.880 -2452.497 -2310.072 -2078.902 -1767.875 -1388.996 -956.803 -487.751 -0.069 487.670 956.580 1389.019 1768.019
-2496.021 -2452.806 -2310.200 -2079.080 -1767.930 -1389.093 -956.719 -487.691 0.045 487.702 956.684 1388.991 1767.871
-2495.645 -2452.602 -2310.049 -2078.978 -1767.887 -1389.093 -956.780 -487.695 0.011 487.760 956.817 1389.106 1767.891
-2495.408 -2452.576 -2310.001 -2079.072 -1767.862 -1389.074 -956.735 -487.615 0.118 487.821 956.896 1389.062 1767.819
-2495.706 -2452.492 -2309.902 -2079.098 -1767.826 -1388.983 -956.902 -487.781 -0.065 487.806 956.775 1388.965 1767.909
-2495.923 -2452.512 -2309.913 -2078.966 -1767.917 -1389.063 -956.661 -487.623 -0.055 487.704 956.690 1389.161 1767.979
-2495.862 -2452.501 -2309.977 -2079.120 -1767.890 -1389.127 -956.707 -487.813 -0.061 487.783 956.555 1388.909 1767.826
-2495.942 -2452.605 -2309.958 -2078.995 -1767.880 -1389.002 -956.766 -487.657 0.045 487.726 956.723 1388.941 1767.803
...
The feature vectors are totaly diferent.
Also, at the end of the wave2feat from Sphinx3, there is a warning message:
WARNING: "fe.c", line 340: File hola.wav has some frames with zero energy. Consider using dither
but using dither doesn change the message (recursive, I use dither but the program invite me to use dither :)
In the SphinxTrain this message doesn't appear, but te values are closer to zero. (That's confusing)
-I've not changed the sphinx_train.cfg, I read it and the info is correct. I think.
That's something surprises me.
-SphinxTrains wave2feat is giving the correct result. Again, stick to it.
-Please send us the wavefile you are using. Send me direct at archan @ cs dot cmu dot edu. Though wav2feat is not released in 3.5, I think this is a bug.
Arthur
Append info...
The first arguments and values are from te Sphinx3, the second is from de SphinxTrain, unfortunately HTML format does not show clear the columns...
Something to analize:
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
Checking that all the "phones" in the "transcript"... ?, but, I understand that in the transcript file doesn't have phones but words, is it correct?
Transcript has words, Dictionary have a mapping from words to phones.
Arthur
BTW, I still don't have enough information to debug the phone list problem. However, this is something you could
1, trace the perl script yourself.
2, give me information in parrallel of sending me the waveform.
I will work with you to see what we could do. Good luck to both of us.
Arthur