I'm trying to build the acoustic model for AN4 database (targeted for
PocketSphinx) as per the tutorial provided
at "http://cmusphinx.sourceforge.net/html/tutorial.html". Everythhing seems to be going fine till the
point where
make_s2_models.pl is run. At this point I'm getting a FATAL ERROR from
mk_s2sendump.c.
Note: My platform in Windows7, SphinxTrain and AN4 tarballs, are obtained from
the links provided in the
above mentioned tutorial, Little Endian database for AN4 is selected and
Microsoft Visual C++ 2008 express
is used for compiling SphinxTrain.
What could be causing this behavior ?
Thanks,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Should I not run "\99.make_s2_models\make_s2_models.pl" at all or need to mask-off parts within this script ?
Few more:
After creating the AN4 acoustic model, I'd like to use it to decode utterances provided in AN4 database with my
regular pocketsphinx_batch setup. Which directory should I take that will have
all the hmm parameters (i.e the file to
be used for "-hmm" argument in pocketsphinx.batch)
SphinxTrain tutorial talkes of a filler dictionary along with regular dictionary for training as well as for decoding.
In my experience with pocketsphinx so far, I'm used to giving only one
dictionary file (aregument for "-dict" in
pocketsphinx_batch). How do I provide filler dictionary to pocketsphinx_batch
?
Thanks,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Should I not run "\99.make_s2_models\make_s2_models.pl" at all or need to mask-off parts within this script ?
After creating the AN4 acoustic model, I'd like to use it to decode utterances provided in AN4 database with my
regular pocketsphinx_batch setup. Which directory should I take that will have
all the hmm parameters (i.e the file to
be used for "-hmm" argument in pocketsphinx.batch)
Note: I used an4.cd_semi_1000 hmm directory and was successful in running
pocketsphinx_bat with language model
provided with an4. Tried decoding some of the test files provided. Accuracy
wasn't too good but the flow worked ;-)
Am I picking the right model directory. There are 2 more directories that have
the same or a later timestamps
(an4.cd_semi_1000-delinterp and an4.cd_semi_1000.s2models....)
SphinxTrain tutorial talkes of a filler dictionary along with regular dictionary for training as well as for decoding.
In my experience with pocketsphinx so far, I'm used to giving only one
dictionary file (aregument for "-dict" in
pocketsphinx_batch). How do I provide filler dictionary to pocketsphinx_batch
?
Note: In the above mentioned decoder run, I didn't bother to use the filler
directory. Hope it is OK.
Thanks,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Should I not run "\99.make_s2_models\make_s2_models.pl" at all or need
to mask-off parts within this script ?
Yes, just don't run it.
After creating the AN4 acoustic model, I'd like to use it to decode
utterances provided in AN4 database with my regular pocketsphinx_batch setup.
Which directory should I take that will have all the hmm parameters (i.e the
file to be used for "-hmm" argument in pocketsphinx.batch) Note: I used
an4.cd_semi_1000 hmm directory and was successful in running pocketsphinx_bat
with language model provided with an4. Tried decoding some of the test files
provided. Accuracy wasn't too good but the flow worked ;-) Am I picking the
right model directory. There are 2 more directories that have the same or a
later timestamps (an4.cd_semi_1000-delinterp and
an4.cd_semi_1000.s2models....)
You picked the right one
SphinxTrain tutorial talkes of a filler dictionary along with regular
dictionary for training as well as for decoding. In my experience with
pocketsphinx so far, I'm used to giving only one dictionary file (aregument
for "-dict" in pocketsphinx_batch). How do I provide filler dictionary to
pocketsphinx_batch ?
You can provide filler dictionary with -fdict option but actually you
shouldn't worry about that. Filler dictionary is automatically placed inside
the model (an4.cd_semi_1000/noisedict) and automatically loaded when you
provide model with -hmm option.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Suppose I want to create an accoustic model which caters to only a "single person". Will it be Ok to train the model with
let's say 100 sentences spoken by the person. These are short command-and-
control type of sentences and the
assumptioin is that "only that person" will use the system and the he/she will
only use sentences out of these 100
for using the system.
Will SphinxTrain be able to train the models with small data provided for
above scenario ?
Can I extend the same thing to cater to say 4 persons....... that means I'll train the model with 100 sentences
spoken by all 4 users and only they will use the system by speaking any of
these sentences.
Thanks,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Suppose I want to create an accoustic model which caters to only a
"single person". Will it be Ok to train the model with let's say 100 sentences
spoken by the person. These are short command-and-control type of sentences
and the assumptioin is that "only that person" will use the system and the
he/she will only use sentences out of these 100 for using the system. Will
SphinxTrain be able to train the models with small data provided for above
scenario ?
It's better to adapt generic model to the specific person in that case. You
can't train anything good with 100 sentences.
Can I extend the same thing to cater to say 4 persons....... that means
I'll train the model with 100 sentences spoken by all 4 users and only they
will use the system by speaking any of these sentences.
Again, this is the case where it's better to use generic model adaptation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I was coming more from the model size point of view ("adapted generic model"
vs "newly trained user specific for a limited vocabulary task model") but from
your comments, looks like the generic one will be much better in accuracy.
Thanks
for comments again.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Just to get a feel of creating an acoustic model, I went ahead and started
training one for myself (based on above mentioned
100 utterances). This went fine upto the point where "Baum Welch" started.
Then the .exe stopped with windows message
."bw.exe has stopped working".
Could this be because of non convergance of algorithm due to small amout of
data or it is something else missing here ?
PS: Just to make it work I also tried to inflate the data by increasing the
number of utterances simply by duplicating the files (and making suitable
adjustments in .fileids and .transcription files) to make the program feel
that the data has
increased (there was no good logic in doing this just wanted to see if it made
any differences....)
Thanks,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Tried another run. This time with 100 sentences each from 4 speakers
(totalling 0.27 hours of recording). "Baum Welch"
failed in iteration 1 exactly the same way as before (log available at http:/
/www.mediafire.com/file/jizn2xjnwmt/an4.html).
My data is recorded at 16 Khz and has mono audio. Is insufficient data or
something else ?
Regards,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In order to find the reason of your problem you need to check training logs
for corresponding steps and for earlier steps. Training logs are located in logdir folder.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I looked into logdir directory and tried comparing it with working an4
training run directory. Here is what I'm seeing,
In my run, I see only two directories "05.vector_quantize" and "20.ci_hmm" created.
"05.vector_quantize looks" OK whereas "20.ci_hmm" has only 4 fiels
"an4.makeflat_cihmm.log" ------- looks OK
"an4.make_ci_mdef_fromphonelist.log" ------- looks OK
"an4.1.1-1.bw.log" ------- Doesn't look OK. It abruptly terminates.
" an4.1.1.norm.log" ------- Doesn't look OK. Has the error message 'Only 0
parts of 1 of Baum
Welch were successfully completed Parts 1 failed to run!"
Is there something wrong with the format of my .dic, .filler, .phone, .fileids or .transcription files due to which
"an4.1.1-1.bw.log" shows an abrupt termination !!!
Your source files are crazy. They are full of windows-style newlines, empty
lines in the dictionary (you are the first who did that), spaces after phones
in the end of lines. You have two ways to solve this problem:
1) Cleanup all whitespaces and make all input files have proper format
2) Download and use latest SphinxTrain from svn/snapshot. This last version is
more tolerant to whitespaces.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As far as dictionary empty spaces are concerned, I had put them at places
where I had
changed pronunciations generated by lmtool or added new pronunciations (I
didn't have a way of putting comments
there). Since this dictionary works perfectly fine with PocketSphinx I never
really suspected that it could be a problem
with SphinxTrain !
I'll do the cleanup and try option 1) suggested by you.
Thanks again,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I did clean-up the setup files (dos to unix) and now am able to successfully
run the training session. Thanks.
For 4 person case the acoustic model is giving excellent average accuracy when
training-set is used as the test-set however
when acoustic model is trained only for one person (100 odd utterances)
accuracy gets a beating even when training-set is
used as test-set. Insufficient trainng data I suppose !
I remember having seen some writeup on thumbrules for selecting number of senones according to the length of training
data but am not able to locate it now. Could you please point me to the
relevant link.
Where can I find the most up-to-date writeup on acoustic model adaptation ?
Thanks and regards,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
hi
you should run the pearl scripts for all the documents in scripts_pl. The file
named slave*.pl in every directory should be run by perl if u dont have that
file u will be having a file named .pl......... run .pl file it will create
new files in the directories . it is the feature files that is needed by the
sphinxtrain to train the acoustic model.........
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I didn't exactly understand the explaination provided by you..... As I
mentioned in my post, I've been able to use SphinxTrain successfully for
training my acoustic models. What I'm looking for is:
A writeup/tutorial on procedure for acoustic model adaptation.
Any writeup which describes deciding number of senones based on amount of training data (I had seen such a document
but am not anle to locate it now).
Regards,
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm trying to build the acoustic model for AN4 database (targeted for
PocketSphinx) as per the tutorial provided
at "http://cmusphinx.sourceforge.net/html/tutorial.html". Everythhing seems to be going fine till the
point where
make_s2_models.pl is run. At this point I'm getting a FATAL ERROR from
mk_s2sendump.c.
End part of an4.html is pasted below.
#####################################################################
MODULE: 90 deleted interpolation (2010-07-07 17:36)
Phase 1: Cleaning up directories: logs...
Phase 2: Doing interpolation...
delint Log File
WARNING: This step had 0 ERROR messages and 6 WARNING messages. Please check
the log file for details.
completed
Phase 3: Dumping senones for PocketSphinx...
mk_s2sendump Log File
completed
MODULE: 99 Convert to Sphinx2 format models (2010-07-07 17:36)
Phase 1: Cleaning up old log files...
Phase 2: Copy noise dictionary
Phase 3: Make codebooks
Log File
mk_s2cb Log File
completed
Phase 4: Make chmm files
mk_s2hmm Log File
completed
Phase 5: Make senone file
Log File
mk_s2sendump Log File
FATAL_ERROR: "........\src\programs\mk_s2sendump\mk_s2sendump.c", line 199:
States(3) != 5
FAILED
#######################################################################
Note: My platform in Windows7, SphinxTrain and AN4 tarballs, are obtained from
the links provided in the
above mentioned tutorial, Little Endian database for AN4 is selected and
Microsoft Visual C++ 2008 express
is used for compiling SphinxTrain.
What could be causing this behavior ?
Thanks,
Tutorial needs little upgrade but in short:
** you can skip this step**
Hi NS,
Thanks,
Few more:
After creating the AN4 acoustic model, I'd like to use it to decode utterances provided in AN4 database with my
regular pocketsphinx_batch setup. Which directory should I take that will have
all the hmm parameters (i.e the file to
be used for "-hmm" argument in pocketsphinx.batch)
SphinxTrain tutorial talkes of a filler dictionary along with regular dictionary for training as well as for decoding.
In my experience with pocketsphinx so far, I'm used to giving only one
dictionary file (aregument for "-dict" in
pocketsphinx_batch). How do I provide filler dictionary to pocketsphinx_batch
?
Thanks,
Should I not run "\99.make_s2_models\make_s2_models.pl" at all or need to mask-off parts within this script ?
After creating the AN4 acoustic model, I'd like to use it to decode utterances provided in AN4 database with my
regular pocketsphinx_batch setup. Which directory should I take that will have
all the hmm parameters (i.e the file to
be used for "-hmm" argument in pocketsphinx.batch)
Note: I used an4.cd_semi_1000 hmm directory and was successful in running
pocketsphinx_bat with language model
provided with an4. Tried decoding some of the test files provided. Accuracy
wasn't too good but the flow worked ;-)
Am I picking the right model directory. There are 2 more directories that have
the same or a later timestamps
(an4.cd_semi_1000-delinterp and an4.cd_semi_1000.s2models....)
In my experience with pocketsphinx so far, I'm used to giving only one
dictionary file (aregument for "-dict" in
pocketsphinx_batch). How do I provide filler dictionary to pocketsphinx_batch
?
Note: In the above mentioned decoder run, I didn't bother to use the filler
directory. Hope it is OK.
Thanks,
Yes, just don't run it.
You picked the right one
You can provide filler dictionary with -fdict option but actually you
shouldn't worry about that. Filler dictionary is automatically placed inside
the model (an4.cd_semi_1000/noisedict) and automatically loaded when you
provide model with -hmm option.
Thanks so much NS.
I have another related question:
let's say 100 sentences spoken by the person. These are short command-and-
control type of sentences and the
assumptioin is that "only that person" will use the system and the he/she will
only use sentences out of these 100
for using the system.
Will SphinxTrain be able to train the models with small data provided for
above scenario ?
spoken by all 4 users and only they will use the system by speaking any of
these sentences.
Thanks,
It's better to adapt generic model to the specific person in that case. You
can't train anything good with 100 sentences.
Again, this is the case where it's better to use generic model adaptation.
Thanks NS,
I was coming more from the model size point of view ("adapted generic model"
vs "newly trained user specific for a limited vocabulary task model") but from
your comments, looks like the generic one will be much better in accuracy.
Thanks
for comments again.
Hi NS,
Just to get a feel of creating an acoustic model, I went ahead and started
training one for myself (based on above mentioned
100 utterances). This went fine upto the point where "Baum Welch" started.
Then the .exe stopped with windows message
."bw.exe has stopped working".
Could this be because of non convergance of algorithm due to small amout of
data or it is something else missing here ?
log file is uploaded at "http://www.mediafire.com/file/wrhteeoxzym/an4.html"
PS: Just to make it work I also tried to inflate the data by increasing the
number of utterances simply by duplicating the files (and making suitable
adjustments in .fileids and .transcription files) to make the program feel
that the data has
increased (there was no good logic in doing this just wanted to see if it made
any differences....)
Thanks,
Hi NS,
Tried another run. This time with 100 sentences each from 4 speakers
(totalling 0.27 hours of recording). "Baum Welch"
failed in iteration 1 exactly the same way as before (log available at http:/
/www.mediafire.com/file/jizn2xjnwmt/an4.html).
My data is recorded at 16 Khz and has mono audio. Is insufficient data or
something else ?
Regards,
In order to find the reason of your problem you need to check training logs
for corresponding steps and for earlier steps. Training logs are located in
logdir folder.
Hi NS,
Thankx.
I looked into logdir directory and tried comparing it with working an4
training run directory. Here is what I'm seeing,
"an4.makeflat_cihmm.log" ------- looks OK
"an4.make_ci_mdef_fromphonelist.log" ------- looks OK
"an4.1.1-1.bw.log" ------- Doesn't look OK. It abruptly terminates.
" an4.1.1.norm.log" ------- Doesn't look OK. Has the error message 'Only 0
parts of 1 of Baum
Welch were successfully completed Parts 1 failed to run!"
"an4.1.1-1.bw.log" shows an abrupt termination !!!
I'm putting some relevant directories of my database at "http://www.mediafire
.com/file/t54egzdneid/an4.zip".
Regards,
Your source files are crazy. They are full of windows-style newlines, empty
lines in the dictionary (you are the first who did that), spaces after phones
in the end of lines. You have two ways to solve this problem:
1) Cleanup all whitespaces and make all input files have proper format
2) Download and use latest SphinxTrain from svn/snapshot. This last version is
more tolerant to whitespaces.
Hi NS,
Thanks for the pointers.
As far as dictionary empty spaces are concerned, I had put them at places
where I had
changed pronunciations generated by lmtool or added new pronunciations (I
didn't have a way of putting comments
there). Since this dictionary works perfectly fine with PocketSphinx I never
really suspected that it could be a problem
with SphinxTrain !
I'll do the cleanup and try option 1) suggested by you.
Thanks again,
Hi NS,
I did clean-up the setup files (dos to unix) and now am able to successfully
run the training session. Thanks.
For 4 person case the acoustic model is giving excellent average accuracy when
training-set is used as the test-set however
when acoustic model is trained only for one person (100 odd utterances)
accuracy gets a beating even when training-set is
used as test-set. Insufficient trainng data I suppose !
I remember having seen some writeup on thumbrules for selecting number of senones according to the length of training
data but am not able to locate it now. Could you please point me to the
relevant link.
Where can I find the most up-to-date writeup on acoustic model adaptation ?
Thanks and regards,
hi
you should run the pearl scripts for all the documents in scripts_pl. The file
named slave*.pl in every directory should be run by perl if u dont have that
file u will be having a file named .pl......... run .pl file it will create
new files in the directories . it is the feature files that is needed by the
sphinxtrain to train the acoustic model.........
Hi ramsdoe,
I didn't exactly understand the explaination provided by you..... As I
mentioned in my post, I've been able to use SphinxTrain successfully for
training my acoustic models. What I'm looking for is:
A writeup/tutorial on procedure for acoustic model adaptation.
Any writeup which describes deciding number of senones based on amount of training data (I had seen such a document
but am not anle to locate it now).
Regards,
Hello
See
http://cmusphinx.sourceforge.net/wiki/acousticmodeladaptation
but that will be updated soon. The core idea is that you can combine MLLR with
MAP to get best adaptation results.
See http://cmusphinx.sourceforge.net/wiki/tutorialam#configure_model_type_and
_model_parameters
Thanks NS,
Looks like the link for acoustic model training "http://cmusphinx.sourceforge
.net/html/tutorial.html"
is now redirecting to
"http://cmusphinx.sourceforge.net/wiki/tutorialam". Does the old document still exist somehwhere ? It
had a very nice and informative appendix for starters.
Regards,
no
there was nothing important that is missing in a new document