Hi,
I want to use sphinxtrain for alphabet data corpus. I have trained it before but the accuracy was 0% when recognition. so, before i start the training again, i want to know whether my dictionary,phonelist and transcription are right or wrong. can anyone here give me some guides or samples for the alphabet data training?
alpha.dict
----------
A EY
B B IY
C S IY
D D IY
E IY
F EH F
G JH IY
H EY CH
I AY
J JH EY
K K EY
L EH L
M EH M
N EH N
O OW
P P IY
Q K Y UW
R AA R
S EH S
T T IY
U Y UW
V V IY
W D AH B AX L Y UW
X EH K S
Y W AY
Z Z IY
alpha.phone
------------
AA
AH
AX
AY
B
CH
D
EH
EY
F
IY
JH
K
L
M
N
OW
P
R
S
SIL
T
UW
V
W
Y
Z
alpha.transcription
------------------
<s> A </s> (0AF1SET0)
<s> A </s> (0AF1SET1)
<s> A </s> (0AF1SET2)
<s> A </s> (0AF1SET3)
<s> A </s> (0AF1SET4)
<s> A </s> (0AF1SET5)
.
.
.
<s> Z </s> (0ZF1SET3)
<s> Z </s> (0ZF1SET4)
<s> Z </s> (0ZF1SET5)
<s> Z </s> (0ZF1SET6)
<s> Z </s> (0ZF1SET7)
<s> Z </s> (0ZF1SET8)
<s> Z </s> (0ZF1SET9)
any help will be very appreciated.
thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
another issue, is when i used verify_all.pl the message said phone AA (say) occurs in dictionary but not in the phonelist. but, obviously it is occurs in my phonelist file. why it can't found that phone and others? is it any error i should fix in the pearl scripts?
thanks for your time.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2004-09-16
I've got the same problem: i was training models for digits and 2 control words (start and end) for Lithuanian language.
My files seem to be ok:
transcription file
<s> <sil> VIENAS <sil></s> (a0908_01)
<s> <sil> DU <sil></s> (a0908_02)
<s> <sil> TRYS <sil></s> (a0908_03)
etc.
dictionary
VIENAS V IE N AX S
DU D UH
TRYS T R IY S
etc.
it seems i have enough data to get certain results (each word is repeated in approx 120 files)
but recognition results are 0% :(
I really can't figure out why.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Daniel and Thoams,
I cannot debug either one of your problem because the information given is not enough. One thing I am not sure is that the problems you got are two different types of problem.
Daniel's problem seems to relate only to verify.pl . It is usually not that difficult to figure out why.
Thomas' problem is a little bit different. Thomas, do you also have the same problem as Daniel in verify_all.pl? Or do you have other problems? I am not sure.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I want to use sphinxtrain for alphabet data corpus. I have trained it before but the accuracy was 0% when recognition. so, before i start the training again, i want to know whether my dictionary,phonelist and transcription are right or wrong. can anyone here give me some guides or samples for the alphabet data training?
alpha.dict
----------
A EY
B B IY
C S IY
D D IY
E IY
F EH F
G JH IY
H EY CH
I AY
J JH EY
K K EY
L EH L
M EH M
N EH N
O OW
P P IY
Q K Y UW
R AA R
S EH S
T T IY
U Y UW
V V IY
W D AH B AX L Y UW
X EH K S
Y W AY
Z Z IY
alpha.phone
------------
AA
AH
AX
AY
B
CH
D
EH
EY
F
IY
JH
K
L
M
N
OW
P
R
S
SIL
T
UW
V
W
Y
Z
alpha.transcription
------------------
<s> A </s> (0AF1SET0)
<s> A </s> (0AF1SET1)
<s> A </s> (0AF1SET2)
<s> A </s> (0AF1SET3)
<s> A </s> (0AF1SET4)
<s> A </s> (0AF1SET5)
.
.
.
<s> Z </s> (0ZF1SET3)
<s> Z </s> (0ZF1SET4)
<s> Z </s> (0ZF1SET5)
<s> Z </s> (0ZF1SET6)
<s> Z </s> (0ZF1SET7)
<s> Z </s> (0ZF1SET8)
<s> Z </s> (0ZF1SET9)
any help will be very appreciated.
thanks.
I think that makes sense if you are doing training. -Arthur
another issue, is when i used verify_all.pl the message said phone AA (say) occurs in dictionary but not in the phonelist. but, obviously it is occurs in my phonelist file. why it can't found that phone and others? is it any error i should fix in the pearl scripts?
thanks for your time.
I've got the same problem: i was training models for digits and 2 control words (start and end) for Lithuanian language.
My files seem to be ok:
transcription file
<s> <sil> VIENAS <sil></s> (a0908_01)
<s> <sil> DU <sil></s> (a0908_02)
<s> <sil> TRYS <sil></s> (a0908_03)
etc.
dictionary
VIENAS V IE N AX S
DU D UH
TRYS T R IY S
etc.
it seems i have enough data to get certain results (each word is repeated in approx 120 files)
but recognition results are 0% :(
I really can't figure out why.
Hi Daniel and Thoams,
I cannot debug either one of your problem because the information given is not enough. One thing I am not sure is that the problems you got are two different types of problem.
Daniel's problem seems to relate only to verify.pl . It is usually not that difficult to figure out why.
Thomas' problem is a little bit different. Thomas, do you also have the same problem as Daniel in verify_all.pl? Or do you have other problems? I am not sure.