Moved here from Speech Recognition as requested. Sorry for the repost.
Hi,
I am trying to use SphinxTrain to train up some English acoustic models for a
project. I have trained models with SphinxTrain on numerous prior occassions
but this time I encountered a new error that I'm unfamiliar with.
I have about 200hrs of speech data with transcripts and appropriate lexicon,
etc., I initially went through a training run with a small fraction of this
data (=~2%) just to confirm that things were more or less setup correctly.
This completed successfully so I started a run with the full corpus.
The full corpus setup made it through the first several steps,
I am trying to use SphinxTrain to train up some English acoustic models for a
project. I have trained models with SphinxTrain on numerous prior occassions
but this time I encountered a new error that I'm unfamiliar with.
I have about 200hrs of speech data with transcripts and appropriate lexicon,
etc., I initially went through a training run with a small fraction of this
data (=~2%) just to confirm that things were more or less setup correctly.
This completed successfully so I started a run with the full corpus.
The full corpus setup made it through the first several steps,
INFO: corpus.c(1346): Will process 17768 utts starting at 53304
INFO: main.c(622): Reestimation: Baum-Welch
bw: gauden.c:1377: gauden_scale_densities_bwd: Assertion `finite(den)' failed.
....
I tracked this down in the source code to,
gauden.c:
gauden_scale_densities_bwd(...){
....
/ BHIKSHA converged g->n_density to g->n_top; possible bugfix, APR 6 98 /
for (k = 0; k < g->n_top; k++) {
/ BHIKSHA converged g->n_density to g->n_top; possible bugfix, END /
den = EXPF(den - scl);
assert(finite(den));
...
}
Where the above assertion seems to be what is causing the failure, however I'm
uncertain as to what this implies about my training data, or the process
itself.
The only thing I'm doing that is any different from previous training
exercises or projects is that I'm using Queue::POSIX and npart > 1, however it
seems unlikely that this should be the cause of the problem.
Any pointers as to what this error implies would be greatly
appreciated!---------------
....
INFO: corpus.c(1346): Will process 17768 utts starting at 53304
INFO: main.c(622): Reestimation: Baum-Welch
bw: gauden.c:1377: gauden_scale_densities_bwd: Assertion `finite(den)' failed.
....
I tracked this down in the source code to,
gauden.c:
gauden_scale_densities_bwd(...){
....
/ BHIKSHA converged g->n_density to g->n_top; possible bugfix, APR 6 98 /
for (k = 0; k < g->n_top; k++) {
/ BHIKSHA converged g->n_density to g->n_top; possible bugfix, END /
den = EXPF(den - scl);
assert(finite(den));
...
}
Where the above assertion seems to be what is causing the failure, however I'm
uncertain as to what this implies about my training data, or the process
itself.
The only thing I'm doing that is any different from previous training
exercises or projects is that I'm using Queue::POSIX and npart > 1, however it
seems unlikely that this should be the cause of the problem.
Any pointers as to what this error implies would be greatly appreciated!
Since the initial post I also took a closer look at the mixture_weights files
and noted that there are several mixtures which are populated by 'nan' values.
Commenting out the assertion and re-running the training, I was actually able
to decode, but this is a pretty bad idea...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Nan is taken from your feature files. It looks like you extracted features
incorreclty, probably without using dither from zero energy data.
I thought about this and checked all the feature files for occurrences of odd
or nan values. I also used dither during feature extraction so i don't think
that this explains the problem.
It also looks like you are using outdated SphinxTrain
I downloaded the latest version from the website just 3 days ago so it
shouldn't be out of date. i just double checked this and it seems there is
only one version of sphinxtrain available anyway, at least from the
recommended downloads area. i have the latest version of sphinxbase as well.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anyway, good practice if something fails is to train on both halfs of the
training data and check which part fails. This way you can localize
problematic part of the database.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've run several more iterations of training, trying to narrow down the
problem but am still not having any luck. The training process makes it to
2gaussiians for the CD models and then at the next split it dies.
In the past when I have seen it die as a result of bad training data it has
always done so right at the beginning, not halfway through the CD training
stage. Any further insight as to what might cause such an issue at such a late
stage would be greatly appreciated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I eventually resolved this. I had the endianness set wrong. Interestingly
enough, I was able to train models with a small amount of data (<5hrs) and
perform tests on it. Results were pretty bad, but bizarrely it actually
worked. Increasing the amount of data caused the likelihoods to bottom out and
the training process to crash. Anyway after matching the endianness everything
worked fine. Pretty retarded.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Moved here from Speech Recognition as requested. Sorry for the repost.
Hi,
I am trying to use SphinxTrain to train up some English acoustic models for a
project. I have trained models with SphinxTrain on numerous prior occassions
but this time I encountered a new error that I'm unfamiliar with.
I have about 200hrs of speech data with transcripts and appropriate lexicon,
etc., I initially went through a training run with a small fraction of this
data (=~2%) just to confirm that things were more or less setup correctly.
This completed successfully so I started a run with the full corpus.
The full corpus setup made it through the first several steps,
("$ST::CFG_SCRIPT_DIR/00.verify/verify_all.pl",
"$ST::CFG_SCRIPT_DIR/02.falign_ci_hmm/slave_convg.pl",
"$ST::CFG_SCRIPT_DIR/03.force_align/slave_align.pl",
"$ST::CFG_SCRIPT_DIR/20.ci_hmm/slave_convg.pl",
"$ST::CFG_SCRIPT_DIR/30.cd_hmm_untied/slave_convg.pl",
"$ST::CFG_SCRIPT_DIR/40.buildtrees/slave.treebuilder.pl",
"$ST::CFG_SCRIPT_DIR/45.prunetree/slave.state-tying.pl",
but eventually bombed out on
"$ST::CFG_SCRIPT_DIR/50.cd_hmm_tied/slave_convg.pl",
on 4 gaussians, giving the following error,
-------------------------------------Hi,
I am trying to use SphinxTrain to train up some English acoustic models for a
project. I have trained models with SphinxTrain on numerous prior occassions
but this time I encountered a new error that I'm unfamiliar with.
I have about 200hrs of speech data with transcripts and appropriate lexicon,
etc., I initially went through a training run with a small fraction of this
data (=~2%) just to confirm that things were more or less setup correctly.
This completed successfully so I started a run with the full corpus.
The full corpus setup made it through the first several steps,
("$ST::CFG_SCRIPT_DIR/00.verify/verify_all.pl",
"$ST::CFG_SCRIPT_DIR/02.falign_ci_hmm/slave_convg.pl",
"$ST::CFG_SCRIPT_DIR/03.force_align/slave_align.pl",
"$ST::CFG_SCRIPT_DIR/20.ci_hmm/slave_convg.pl",
"$ST::CFG_SCRIPT_DIR/30.cd_hmm_untied/slave_convg.pl",
"$ST::CFG_SCRIPT_DIR/40.buildtrees/slave.treebuilder.pl",
"$ST::CFG_SCRIPT_DIR/45.prunetree/slave.state-tying.pl",
but eventually bombed out on
"$ST::CFG_SCRIPT_DIR/50.cd_hmm_tied/slave_convg.pl",
on 4 gaussians, giving the following error,
....
INFO: corpus.c(1346): Will process 17768 utts starting at 53304
INFO: main.c(622): Reestimation: Baum-Welch
bw: gauden.c:1377: gauden_scale_densities_bwd: Assertion `finite(den)' failed.
....
I tracked this down in the source code to,
gauden.c:
gauden_scale_densities_bwd(...){
....
/ BHIKSHA converged g->n_density to g->n_top; possible bugfix, APR 6 98 /
for (k = 0; k < g->n_top; k++) {
/ BHIKSHA converged g->n_density to g->n_top; possible bugfix, END /
den = EXPF(den - scl);
assert(finite(den));
...
}
Where the above assertion seems to be what is causing the failure, however I'm
uncertain as to what this implies about my training data, or the process
itself.
The only thing I'm doing that is any different from previous training
exercises or projects is that I'm using Queue::POSIX and npart > 1, however it
seems unlikely that this should be the cause of the problem.
Any pointers as to what this error implies would be greatly
appreciated!---------------
....
INFO: corpus.c(1346): Will process 17768 utts starting at 53304
INFO: main.c(622): Reestimation: Baum-Welch
bw: gauden.c:1377: gauden_scale_densities_bwd: Assertion `finite(den)' failed.
....
I tracked this down in the source code to,
gauden.c:
gauden_scale_densities_bwd(...){
....
/ BHIKSHA converged g->n_density to g->n_top; possible bugfix, APR 6 98 /
for (k = 0; k < g->n_top; k++) {
/ BHIKSHA converged g->n_density to g->n_top; possible bugfix, END /
den = EXPF(den - scl);
assert(finite(den));
...
}
Where the above assertion seems to be what is causing the failure, however I'm
uncertain as to what this implies about my training data, or the process
itself.
The only thing I'm doing that is any different from previous training
exercises or projects is that I'm using Queue::POSIX and npart > 1, however it
seems unlikely that this should be the cause of the problem.
Any pointers as to what this error implies would be greatly appreciated!
Since the initial post I also took a closer look at the mixture_weights files
and noted that there are several mixtures which are populated by 'nan' values.
Commenting out the assertion and re-running the training, I was actually able
to decode, but this is a pretty bad idea...
Nan is taken from your feature files. It looks like you extracted features
incorreclty, probably without using dither from zero energy data.
It also looks like you are using outdated SphinxTrain
thanks for the reply.
Anyway, good practice if something fails is to train on both halfs of the
training data and check which part fails. This way you can localize
problematic part of the database.
I've run several more iterations of training, trying to narrow down the
problem but am still not having any luck. The training process makes it to
2gaussiians for the CD models and then at the next split it dies.
In the past when I have seen it die as a result of bad training data it has
always done so right at the beginning, not halfway through the CD training
stage. Any further insight as to what might cause such an issue at such a late
stage would be greatly appreciated.
I eventually resolved this. I had the endianness set wrong. Interestingly
enough, I was able to train models with a small amount of data (<5hrs) and
perform tests on it. Results were pretty bad, but bizarrely it actually
worked. Increasing the amount of data caused the likelihoods to bottom out and
the training process to crash. Anyway after matching the endianness everything
worked fine. Pretty retarded.