I am a student, and I try to make a Speech recognition System for the Spanish-Mexican, but I dont find the Sphinx 3 Front End to generate the feature files.
I try to use the Sphinx 4 Front End but the with kmeans_init program from Sphinx Train I obtain Warnings (cluster empty...), and I try to fix my feature files with options -machendian and -either from Sphinx 3 Front End but I DONT FIND THE SPHINX 3 FRONT ENT, please help me , I need so HELP
Or can any tell me how can I insert -machendian and -either option in the Sphinx4 front end, or tell me how can I fix the mistakes of kmeans_init, PLEASE
Omar, Mxico
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2005-05-10
Omar -- you have not told us exactly what you are trying to do. Are you tryng to train an acoustic model, or use an existing model with one of the Sphonx recognizers?
If you are using SphinxTrain to train an acoustic model, then you should use the program wave2feat in that package to produce the feature files.
cheers,
jerry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I try to train my own acustic model to the Spanish-Mexican lenguage, but I have problems with the Sphinx Train aplicactions, kmeans_init ( many Warnings -clusters empty-) and the init_mixw ( Too few source mixing weights, 72, for the # of tied states, 834 ) and cant continue with train, please can yu help me?, or can you tell me about a manual (update ) of Sphinx Train, because the actual manual have a parameters errors anda I dont understand the inputs.
Please help me
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2005-05-10
Omar -- OK, now tell us more.
1. Are you training a semi-continuous acoustic model (for Sphinx-2) or a continuous model?
2. Are you using the perl scripts in scripts_pl, or are you running each application manually? The perl scripts are much easier to use, since they specify default parameter values for almost everything.
cheers,
jerry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2005-05-10
Also --
3. What kind of system are you using (Unix/Linux, Windows, Mac, ...)?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Im training a semi-continous acustic model, because I will be create a aplication with the Sphinx 4
I am running each application manually
I am using Linux
Can you tell me how can i create my acustic model and train corectly please?
This are my primary files:
( I have 70 sound files of 70 words in spanish-mexican)
fulldic.sic
ADIOS A D I O S
AOS A NN O S
AUTO A U T O
BAILAR B A I L A R
BIEN B I E N
CANCIONES K A N S I O N E S
CASA K A S A
CERO S E R O
CINCO S I N K O
COMO K O M O
COMPUTADORAS K O M P U T A D O Rr A S
CORRER K O R E R
CUANTOS K U A N T O S
CUATRO K U A T R O
DOS D O S
EL E L
ENFERMA E N F E R M A
EN E N
ES E S
ESTAS E S T A S
ESTOY E S T O I
GABRIELA G A B R I E L A
GUSTA G U S T A
GUSTAN G U S T A N
GUSTO G U S T O
HASTA A S T A
HERMANA E R M A N A
HERMANO E R M A N O
HOLA O L A
INVIERNO I N B I E R N O
JUGAR X U G A R
JULIETA X U L I E T A
LA L A
LAS L A S
LLUVIA LL U B I A
LUEGO L U E G O
MAMA M A M A
MAANA M A NN A N A
ME M E
MI M I
MUCHO M U TS O
NADAR N A D A R
NOMBRE N O M B R E
NOS N O S
NUEVE N U E B E
NUMERO N U M E R O
OAXACA O A X A K A
OCHO O TS O
OMAR O M A R
OSCAR O S K A R
OTOO O T O NN O
PAPA P A P A
PINOTEPA P I N O T E P A
PRIMAVERA P R I M B E R A
QUE K E
SEIS S E I S
SIENTO S I E N T O
SIETE S I E T E
SUENO S W E N O
TELEFONO T E L E F O N O
TENGO T E N G O
TIA T I A
TIENES T I E N E S
TIO T I O
TRES T R E S
UNO U N O
VEMOS B E M O S
VERANO B E R A N O
VIVO B I B O
YO I O
filldic.dic
<s> SIL
<sil> SIL
</s> SIL
transcriptfile.tra
ADIOS
AOS
AUTO
BAILAR
BIEN
CANCIONES
CASA
CERO
CINCO
COMO
COMPUTADORAS
CORRER
CUANTOS
CUATRO
DOS
EL
ENFERMA
EN
ES
ESTAS
ESTOY
GABRIELA
GUSTA
GUSTAN
GUSTO
HASTA
HERMANA
HERMANO
HOLA
INVIERNO
JUGAR
JULIETA
LA
LAS
LLUVIA
LUEGO
MAMA
MAANA
ME
MI
MUCHO
NADAR
NOMBRE
NOS
NUEVE
NUMERO
OAXACA
OCHO
OMAR
OSCAR
OTOO
PAPA
PINOTEPA
PRIMAVERA
QUE
SEIS
SIENTO
SIETE
SUENO
TELEFONO
TENGO
TIA
TIENES
TIO
TRES
UNO
VEMOS
VERANO
VIVO
YO
fonemas.fon
P
B
T
D
K
G
F
S
X
TS
M
N
NN
L
LL
Rr
R
W
I
E
A
O
U
SIL
PLEASE CHECK IT AND PLEASE TELL ME WAS WRONG OR TELL ME ABOUT A MANUAL, IAM READY TO LEARN
omar
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2005-05-10
One problem is that the transcription file (transcriptfile.tra) is of the wrong format. Each line must contain the file id in parentheses at the end. For example, if your first two training files are file0001.wav and file0002.wav, the first two lines should look like:
ADIOS (file0001)
ANOS (file0002)
Furthermore, I believe you must also have a control file that lists the training files in the same order as the transcription file, one file per line:
file0001
file0002
file0003, etc.
Unfortunately, there is no complete and up-to-date manual for using SphinxTrain. As you have learned, it's not simple! Using the Perl scripts in the SphinxTrain distribution is the easiest way to make it work.
The amount of training speech you have is rather small if you wish to train a context-dependent triphone model, even if it is to be a speaker-dependent model. It would be better to have 10x as much, with longer, multi-word utterances.
cheers,
jerry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I find the scripts (perl) in the directory (scripts_pl of Sphinx Train) but I dont now how can I begin to Train because I havent the sphinx_train.cfg file, and i dontnow whats the format of this file.
I must run script RunAll or run script by script, Can you tell me how can I Train my aucuctic model, using the scripts?. I make more utterances (and multiwords) and I fix the mistakes in my transcription file.
Please help me. and thanks for response. And sorry for anything
omar
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
With the setup_SphinxTrain <task> you will obtain a directory structure like that:
bin/ etc/ gifs/ model_architecture/ scripts_pl/
bwaccumdir/ feat/ logdir/ model_parameters/ wav/
In etc/ you will find the sphinx_train.cfg that you have to modify.
I hope this helps.
cheers,
Sergio
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The past error (last message) was fix, but I dont know how i fix it.
I compute the features again , but this time I use the -mach_endian big -deither yes and -input_indian big (of the wave2feat application) and run again the 02/slave_convg.pl and I pas the first iteration of bw, but in the first execution of norm I have a error that I dont know, I open an check the file norm.pl but I dont find the mistake.
Pleasem help help
this is the run and the error:
[root@localhost modeloaspmx]# perl /root/modeloaspmx/scripts_pl/02.ci_schmm/slave_convg.pl
MODULE: 02 Training Context Independent models
Cleaning up directories: accumulator...logs...models...
Flat initialize
Baum welch starting for iteration: 1 (1 of 1)
Using 20 files: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Finished
Normalization for iteration: 1 ERROR: ERROR: "cmd_ln.c", line 525: Expecting '/root/modeloaspmx/bin/norm -switch_1 <arg_1> -switch_2 <arg_2> ...'
Finished
Current Overall Likelihood Per Frame = -42.6187337485638
Baum welch starting for iteration: 2 (1 of 1)
Using 20 files: 0% ERROR: FATAL_ERROR: "main.c", line 969: initialization failed
ERROR: FATAL_ERROR: "main.c", line 969: initialization failed
Received a fatal error at /root/modeloaspmx/scripts_pl/02.ci_schmm/baum_welch.pl line 143, <PIPE> line 94.
omar
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2005-05-11
Omar -- first of all, Sphinx-4 uses ONLY continuous acoustic models, so building a semi-continuous model is a mistake.
Sergio has already told you the first important step -- except that the shell script "setup_SphinxTrain" has been supplanted by a new Perl script "setup_SphinxTrain.pl". Evandro has been improving it lately, and I think that the way to invoke it is:
Second, you must set up 5 data files in etc/ to describe your modeling data. These are described in doc/tinydoc.txt, and they must be named <task>.fileids, <task>.phone, etc.
Third, you must compute (or move) the feature files into feat/
Fourth, you should read etc/sphinx_train.cfg, which contains a number of configuration variables with names of the form $CFG_XXXX. You will need to set the values of several of these. The most important ones are:
$CFG_HMM_TYPE -- defaults to '.cont.', denoting continuous HMMs; this would be '.semi.' for semi-continuous models (for Sphinx2).
$CFG_FINAL_NUM_DENSITIES = the number of Gaussian densities desired in the final tied-state CHMM model; defaults to 16. For your case (little data) it should probably be 4 or 2.
$CFG_N_TIED_STATES = the number of tied states in the final model; defaults to 6000. I don't know what it should be for your small model, perhaps 2000?
Then you can run scripts_pl "modules" directly, or you can use a "wrapper" script such as scripts_pl/RunAll.pl, but note that RunAll.pl is merely an example -- you must customize it to select the modules that you wish to run. For training continuous models, the modules to run are 00, 02, 03, 04, 05, 06, 07. (Modules 01, 08, and 09 are used only for training semi-continuous models).
Thanks for the lot of information, i begin again to create and Trai n the acustic model, but I will fix all the mistakes, thanks and Can I ask after for all complicated thing about the Acustic models and Sphinx Train? or for a problem with the aplication?
thanks
omar
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I run the script 00/verify_all.pl and I dont know wy my control file have parser errors:
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file
Found 354 words using 23 phones
passed
Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
passed
Phase 3: CTL - Check general format; utterance length (must be positive); files exist
WARNING: CTL line does not parse correctly: file001
WARNING: CTL line does not parse correctly: file002
WARNING: CTL line does not parse correctly: file003
WARNING: CTL line does not parse correctly: file004
WARNING: CTL line does not parse correctly: file005
WARNING: CTL line does not parse correctly: file006
WARNING: CTL line does not parse correctly: file007
WARNING: CTL line does not parse correctly: file008
WARNING: CTL line does not parse correctly: file009
WARNING: CTL line does not parse correctly: file010
WARNING: CTL line does not parse correctly: file011
WARNING: CTL line does not parse correctly: file012
WARNING: CTL line does not parse correctly: file013
WARNING: CTL line does not parse correctly: file014
WARNING: CTL line does not parse correctly: file015
WARNING: CTL line does not parse correctly: file016
WARNING: CTL line does not parse correctly: file017
WARNING: CTL line does not parse correctly: file018
WARNING: CTL line does not parse correctly: file019
WARNING: CTL line does not parse correctly: file020
FAILED
Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
passed
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
passed
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
Words in dictionary: 350
Words in filler dictionary: 2
passed
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
passed
WELL I put 2 integers an a string after that each file ( ex. file001 0 200 abc) and I continue to the 02/slave_convg.pl but I HAVE A PROBLEMS I DONT KNOW WY BW PRINT ERRORS, I DONT UNDERSTAND PLEASE HELP
THIS ARE MY ERRORS:
[root@localhost modeloaspmx]# perl /root/modeloaspmx/scripts_pl/02.ci_schmm/slave_convg.pl
MODULE: 02 Training Context Independent models
Cleaning up directories: accumulator...logs...models...
Flat initialize
Baum welch starting for iteration: 1 (1 of 1)
Using 20 files: 0% ERROR: 460 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file001 ignored
10% ERROR: 448 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file002 ignored
ERROR: 456 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file003 ignored
20% ERROR: 392 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file004 ignored
ERROR: 360 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file005 ignored
30% ERROR: 692 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file006 ignored
ERROR: 752 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file007 ignored
40% ERROR: 712 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file008 ignored
ERROR: 596 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file009 ignored
50% ERROR: 584 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file010 ignored
ERROR: 620 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file011 ignored
60% ERROR: 440 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file012 ignored
ERROR: 692 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file013 ignored
70% ERROR: 656 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file014 ignored
ERROR: 472 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file015 ignored
80% ERROR: 1284 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file016 ignored
ERROR: 1096 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file017 ignored
90% ERROR: 896 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file018 ignored
ERROR: 688 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file019 ignored
100% ERROR: 1020 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file020 ignored
Finished
PLEASE HELP
OMAR
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I am a student, and I try to make a Speech recognition System for the Spanish-Mexican, but I dont find the Sphinx 3 Front End to generate the feature files.
I try to use the Sphinx 4 Front End but the with kmeans_init program from Sphinx Train I obtain Warnings (cluster empty...), and I try to fix my feature files with options -machendian and -either from Sphinx 3 Front End but I DONT FIND THE SPHINX 3 FRONT ENT, please help me , I need so HELP
Or can any tell me how can I insert -machendian and -either option in the Sphinx4 front end, or tell me how can I fix the mistakes of kmeans_init, PLEASE
Omar, Mxico
Omar -- you have not told us exactly what you are trying to do. Are you tryng to train an acoustic model, or use an existing model with one of the Sphonx recognizers?
If you are using SphinxTrain to train an acoustic model, then you should use the program wave2feat in that package to produce the feature files.
cheers,
jerry
thanks to response Jerry Wolf,
I try to train my own acustic model to the Spanish-Mexican lenguage, but I have problems with the Sphinx Train aplicactions, kmeans_init ( many Warnings -clusters empty-) and the init_mixw ( Too few source mixing weights, 72, for the # of tied states, 834 ) and cant continue with train, please can yu help me?, or can you tell me about a manual (update ) of Sphinx Train, because the actual manual have a parameters errors anda I dont understand the inputs.
Please help me
Omar -- OK, now tell us more.
1. Are you training a semi-continuous acoustic model (for Sphinx-2) or a continuous model?
2. Are you using the perl scripts in scripts_pl, or are you running each application manually? The perl scripts are much easier to use, since they specify default parameter values for almost everything.
cheers,
jerry
Also --
3. What kind of system are you using (Unix/Linux, Windows, Mac, ...)?
Hi Jerry
Im training a semi-continous acustic model, because I will be create a aplication with the Sphinx 4
I am running each application manually
I am using Linux
Can you tell me how can i create my acustic model and train corectly please?
This are my primary files:
( I have 70 sound files of 70 words in spanish-mexican)
fulldic.sic
ADIOS A D I O S
AOS A NN O S
AUTO A U T O
BAILAR B A I L A R
BIEN B I E N
CANCIONES K A N S I O N E S
CASA K A S A
CERO S E R O
CINCO S I N K O
COMO K O M O
COMPUTADORAS K O M P U T A D O Rr A S
CORRER K O R E R
CUANTOS K U A N T O S
CUATRO K U A T R O
DOS D O S
EL E L
ENFERMA E N F E R M A
EN E N
ES E S
ESTAS E S T A S
ESTOY E S T O I
GABRIELA G A B R I E L A
GUSTA G U S T A
GUSTAN G U S T A N
GUSTO G U S T O
HASTA A S T A
HERMANA E R M A N A
HERMANO E R M A N O
HOLA O L A
INVIERNO I N B I E R N O
JUGAR X U G A R
JULIETA X U L I E T A
LA L A
LAS L A S
LLUVIA LL U B I A
LUEGO L U E G O
MAMA M A M A
MAANA M A NN A N A
ME M E
MI M I
MUCHO M U TS O
NADAR N A D A R
NOMBRE N O M B R E
NOS N O S
NUEVE N U E B E
NUMERO N U M E R O
OAXACA O A X A K A
OCHO O TS O
OMAR O M A R
OSCAR O S K A R
OTOO O T O NN O
PAPA P A P A
PINOTEPA P I N O T E P A
PRIMAVERA P R I M B E R A
QUE K E
SEIS S E I S
SIENTO S I E N T O
SIETE S I E T E
SUENO S W E N O
TELEFONO T E L E F O N O
TENGO T E N G O
TIA T I A
TIENES T I E N E S
TIO T I O
TRES T R E S
UNO U N O
VEMOS B E M O S
VERANO B E R A N O
VIVO B I B O
YO I O
filldic.dic
<s> SIL
<sil> SIL
</s> SIL
transcriptfile.tra
ADIOS
AOS
AUTO
BAILAR
BIEN
CANCIONES
CASA
CERO
CINCO
COMO
COMPUTADORAS
CORRER
CUANTOS
CUATRO
DOS
EL
ENFERMA
EN
ES
ESTAS
ESTOY
GABRIELA
GUSTA
GUSTAN
GUSTO
HASTA
HERMANA
HERMANO
HOLA
INVIERNO
JUGAR
JULIETA
LA
LAS
LLUVIA
LUEGO
MAMA
MAANA
ME
MI
MUCHO
NADAR
NOMBRE
NOS
NUEVE
NUMERO
OAXACA
OCHO
OMAR
OSCAR
OTOO
PAPA
PINOTEPA
PRIMAVERA
QUE
SEIS
SIENTO
SIETE
SUENO
TELEFONO
TENGO
TIA
TIENES
TIO
TRES
UNO
VEMOS
VERANO
VIVO
YO
fonemas.fon
P
B
T
D
K
G
F
S
X
TS
M
N
NN
L
LL
Rr
R
W
I
E
A
O
U
SIL
PLEASE CHECK IT AND PLEASE TELL ME WAS WRONG OR TELL ME ABOUT A MANUAL, IAM READY TO LEARN
omar
One problem is that the transcription file (transcriptfile.tra) is of the wrong format. Each line must contain the file id in parentheses at the end. For example, if your first two training files are file0001.wav and file0002.wav, the first two lines should look like:
ADIOS (file0001)
ANOS (file0002)
Furthermore, I believe you must also have a control file that lists the training files in the same order as the transcription file, one file per line:
file0001
file0002
file0003, etc.
Unfortunately, there is no complete and up-to-date manual for using SphinxTrain. As you have learned, it's not simple! Using the Perl scripts in the SphinxTrain distribution is the easiest way to make it work.
The amount of training speech you have is rather small if you wish to train a context-dependent triphone model, even if it is to be a speaker-dependent model. It would be better to have 10x as much, with longer, multi-word utterances.
cheers,
jerry
I find the scripts (perl) in the directory (scripts_pl of Sphinx Train) but I dont now how can I begin to Train because I havent the sphinx_train.cfg file, and i dontnow whats the format of this file.
I must run script RunAll or run script by script, Can you tell me how can I Train my aucuctic model, using the scripts?. I make more utterances (and multiwords) and I fix the mistakes in my transcription file.
Please help me. and thanks for response. And sorry for anything
omar
Hi,
how they said before, the easiest way to train acoustics models is by the way of the perl scripts. The way to do that is:
mkdir <task>
cd <task>
export SPHINXTRAINDIR=pathtosphinxtrain
SPHINXTRAINDIR/scripts_pl/setup_SphinxTrain <task>
With the setup_SphinxTrain <task> you will obtain a directory structure like that:
bin/ etc/ gifs/ model_architecture/ scripts_pl/
bwaccumdir/ feat/ logdir/ model_parameters/ wav/
In etc/ you will find the sphinx_train.cfg that you have to modify.
I hope this helps.
cheers,
Sergio
I have new problems,
The past error (last message) was fix, but I dont know how i fix it.
I compute the features again , but this time I use the -mach_endian big -deither yes and -input_indian big (of the wave2feat application) and run again the 02/slave_convg.pl and I pas the first iteration of bw, but in the first execution of norm I have a error that I dont know, I open an check the file norm.pl but I dont find the mistake.
Pleasem help help
this is the run and the error:
[root@localhost modeloaspmx]# perl /root/modeloaspmx/scripts_pl/02.ci_schmm/slave_convg.pl
MODULE: 02 Training Context Independent models
Cleaning up directories: accumulator...logs...models...
Flat initialize
Baum welch starting for iteration: 1 (1 of 1)
Using 20 files: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Finished
Normalization for iteration: 1 ERROR: ERROR: "cmd_ln.c", line 525: Expecting '/root/modeloaspmx/bin/norm -switch_1 <arg_1> -switch_2 <arg_2> ...'
Finished
Current Overall Likelihood Per Frame = -42.6187337485638
Baum welch starting for iteration: 2 (1 of 1)
Using 20 files: 0% ERROR: FATAL_ERROR: "main.c", line 969: initialization failed
ERROR: FATAL_ERROR: "main.c", line 969: initialization failed
Received a fatal error at /root/modeloaspmx/scripts_pl/02.ci_schmm/baum_welch.pl line 143, <PIPE> line 94.
omar
Omar -- first of all, Sphinx-4 uses ONLY continuous acoustic models, so building a semi-continuous model is a mistake.
Sergio has already told you the first important step -- except that the shell script "setup_SphinxTrain" has been supplanted by a new Perl script "setup_SphinxTrain.pl". Evandro has been improving it lately, and I think that the way to invoke it is:
$SPHINXTRAINDIR/scripts_pl/setup_SphinxTrain -task <task>
where <task> is the name of the model directory.
Second, you must set up 5 data files in etc/ to describe your modeling data. These are described in doc/tinydoc.txt, and they must be named <task>.fileids, <task>.phone, etc.
Third, you must compute (or move) the feature files into feat/
Fourth, you should read etc/sphinx_train.cfg, which contains a number of configuration variables with names of the form $CFG_XXXX. You will need to set the values of several of these. The most important ones are:
$CFG_HMM_TYPE -- defaults to '.cont.', denoting continuous HMMs; this would be '.semi.' for semi-continuous models (for Sphinx2).
$CFG_FINAL_NUM_DENSITIES = the number of Gaussian densities desired in the final tied-state CHMM model; defaults to 16. For your case (little data) it should probably be 4 or 2.
$CFG_N_TIED_STATES = the number of tied states in the final model; defaults to 6000. I don't know what it should be for your small model, perhaps 2000?
Then you can run scripts_pl "modules" directly, or you can use a "wrapper" script such as scripts_pl/RunAll.pl, but note that RunAll.pl is merely an example -- you must customize it to select the modules that you wish to run. For training continuous models, the modules to run are 00, 02, 03, 04, 05, 06, 07. (Modules 01, 08, and 09 are used only for training semi-continuous models).
Also note 4 items in the CMU Sphinx "bugs" page http://sourceforge.net/tracker/?group_id=1904&atid=101904, specifically 4 of them submitted by me, numbered 1174324-354. These should be fixed in your perl scripts.
Training an acoustic model is a rather complex process, and you will find a lot to learn. I hope this gets you started.
cheeers,
jerry
Jerry,
Thanks for the lot of information, i begin again to create and Trai n the acustic model, but I will fix all the mistakes, thanks and Can I ask after for all complicated thing about the Acustic models and Sphinx Train? or for a problem with the aplication?
thanks
omar
I run the script 00/verify_all.pl and I dont know wy my control file have parser errors:
Phase 1: DICT - Checking to see if the dict and filler dict agrees with the phonelist file
Found 354 words using 23 phones
passed
Phase 2: DICT - Checking to make sure there are not duplicate entries in the dictionary
passed
Phase 3: CTL - Check general format; utterance length (must be positive); files exist
WARNING: CTL line does not parse correctly: file001
WARNING: CTL line does not parse correctly: file002
WARNING: CTL line does not parse correctly: file003
WARNING: CTL line does not parse correctly: file004
WARNING: CTL line does not parse correctly: file005
WARNING: CTL line does not parse correctly: file006
WARNING: CTL line does not parse correctly: file007
WARNING: CTL line does not parse correctly: file008
WARNING: CTL line does not parse correctly: file009
WARNING: CTL line does not parse correctly: file010
WARNING: CTL line does not parse correctly: file011
WARNING: CTL line does not parse correctly: file012
WARNING: CTL line does not parse correctly: file013
WARNING: CTL line does not parse correctly: file014
WARNING: CTL line does not parse correctly: file015
WARNING: CTL line does not parse correctly: file016
WARNING: CTL line does not parse correctly: file017
WARNING: CTL line does not parse correctly: file018
WARNING: CTL line does not parse correctly: file019
WARNING: CTL line does not parse correctly: file020
FAILED
Phase 4: CTL - Checking number of lines in the transcript should match lines in control file
passed
Phase 5: CTL - Determine amount of training data, see if n_tied_states seems reasonable.
passed
Phase 6: TRANSCRIPT - Checking that all the words in the transcript are in the dictionary
Words in dictionary: 350
Words in filler dictionary: 2
passed
Phase 7: TRANSCRIPT - Checking that all the phones in the transcript are in the phonelist, and all phones in the phonelist appear at least once
passed
WELL I put 2 integers an a string after that each file ( ex. file001 0 200 abc) and I continue to the 02/slave_convg.pl but I HAVE A PROBLEMS I DONT KNOW WY BW PRINT ERRORS, I DONT UNDERSTAND PLEASE HELP
THIS ARE MY ERRORS:
[root@localhost modeloaspmx]# perl /root/modeloaspmx/scripts_pl/02.ci_schmm/slave_convg.pl
MODULE: 02 Training Context Independent models
Cleaning up directories: accumulator...logs...models...
Flat initialize
Baum welch starting for iteration: 1 (1 of 1)
Using 20 files: 0% ERROR: 460 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file001 ignored
10% ERROR: 448 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file002 ignored
ERROR: 456 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file003 ignored
20% ERROR: 392 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file004 ignored
ERROR: 360 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file005 ignored
30% ERROR: 692 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file006 ignored
ERROR: 752 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file007 ignored
40% ERROR: 712 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file008 ignored
ERROR: 596 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file009 ignored
50% ERROR: 584 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file010 ignored
ERROR: 620 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file011 ignored
60% ERROR: 440 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file012 ignored
ERROR: 692 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file013 ignored
70% ERROR: 656 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file014 ignored
ERROR: 472 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file015 ignored
80% ERROR: 1284 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file016 ignored
ERROR: 1096 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file017 ignored
90% ERROR: 896 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file018 ignored
ERROR: 688 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file019 ignored
100% ERROR: 1020 134 ERROR: "backward.c", line 409: final state not reached
ERROR: ERROR: "baum_welch.c", line 300: file020 ignored
Finished
PLEASE HELP
OMAR