I'm using Kaldi to train the data I have, and the process stopped right after "train_mono.sh: Aligning data". By looking the log file, it shows as follows:
gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=6 --retry-beam=24 'gmm-boost-silence --boost=1.25 1 exp/mono/1.mdl - |' 'ark:gunzip -c exp/mono/fsts.10.gz|' 'ark,s,cs:apply-cmvn --utt2spk=ark:data/train/split20/10/utt2spk scp:data/train/split20/10/cmvn.scp scp:data/train/split20/10/feats.scp ark:- | add-deltas ark:- ark:- |' 'ark,t:|gzip -c >exp/mono/ali.10.gz'
gmm-boost-silence --boost=1.25 1 exp/mono/1.mdl -
WARNING (gmm-boost-silence:main():gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)
LOG (gmm-boost-silence:main():gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1.25
LOG (gmm-boost-silence:main():gmm-boost-silence.cc:103) Wrote model to -
add-deltas ark:- ark:-
apply-cmvn --utt2spk=ark:data/train/split20/10/utt2spk scp:data/train/split20/10/cmvn.scp scp:data/train/split20/10/feats.scp ark:-
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:143) Retrying utterance aa-ve070829 with beam 24
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:172) Did not successfully decode file aa-ve070829, len = 4638
....
All audio files (1-2 minutes per utterances) were failed in this step. Could you please help me figure out? Thanks a lot.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm using Kaldi to train the data I have, and the process stopped right
after "train_mono.sh: Aligning data". By looking the log file, it shows as
follows:
gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1
--self-loop-scale=0.1 --beam=6 --retry-beam=24 'gmm-boost-silence
--boost=1.25 1 exp/mono/1.mdl - |' 'ark:gunzip -c exp/mono/fsts.10.gz|'
'ark,s,cs:apply-cmvn --utt2spk=ark:data/train/split20/10/utt2spk
scp:data/train/split20/10/cmvn.scp scp:data/train/split20/10/feats.scp
ark:- | add-deltas ark:- ark:- |' 'ark,t:|gzip -c >exp/mono/ali.10.gz'
gmm-boost-silence --boost=1.25 1 exp/mono/1.mdl -
WARNING (gmm-boost-silence:main():gmm-boost-silence.cc:82) The pdfs for
the silence phones may be shared by other phones (note: this probably does
not matter.)
LOG (gmm-boost-silence:main():gmm-boost-silence.cc:93) Boosted weights for
5 pdfs, by factor of 1.25
LOG (gmm-boost-silence:main():gmm-boost-silence.cc:103) Wrote model to -
add-deltas ark:- ark:-
apply-cmvn --utt2spk=ark:data/train/split20/10/utt2spk
scp:data/train/split20/10/cmvn.scp scp:data/train/split20/10/feats.scp ark:-
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:143) Retrying
utterance aa-ve070829 with beam 24
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:172) Did not
successfully decode file aa-ve070829, len = 4638
....
All audio files (1-2 minutes per utterances) were failed in this step.
Could you please help me figure out? Thanks a lot.
hi,
Maybe the bad/retry wav doesn't contain all the words in the transcript.
In my experience, the bad/retry wavs often stop earlyer, some words at the end of the transcripts are not been recorded.
Feiteng
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks, Jan. What the beam size values (--beam and --retry-beam) do you recommend?
It's a great idea to split the long audio, but it'll take time to implement. I saw the librispeech ICASSP paper uses audio segmentation for long speech files, but it seems that the librispeech under the example directory doesn't include that work.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You will have to experiment with it. It is ok if not all of the utterances
will align at this stage. You can also find some utterances that are short
and align well -- for monophone training, not really much data is necessary.
Yes, splitting the audio without any additional information would take
some time to implement -- that's why I was saying "if feasible" :)
y.
Thanks, Jan. What the beam size values (--beam and --retry-beam) do you
recommend?
It's a great idea to split the long audio, but it'll take time to
implement. I saw the librispeech ICASSP paper uses audio segmentation for
long speech files, but it seems that the librispeech under the example
directory doesn't include that work.
You will have to experiment with it. It is ok if not all of the utterances
will align at this stage. You can also find some utterances that are short
and align well -- for monophone training, not really much data is necessary.
Yes, splitting the audio without any additional information would take
some time to implement -- that's why I was saying "if feasible" :)
y.
On Fri, Dec 19, 2014 at 3:51 PM, Lawrence vjdtao@users.sf.net wrote:
Thanks, Jan. What the beam size values (--beam and --retry-beam) do you
recommend?
It's a great idea to split the long audio, but it'll take time to
implement. I saw the librispeech ICASSP paper uses audio segmentation for
long speech files, but it seems that the librispeech under the example
directory doesn't include that work.
I'm using Kaldi to train the data I have, and the process stopped right after "train_mono.sh: Aligning data". By looking the log file, it shows as follows:
gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=6 --retry-beam=24 'gmm-boost-silence --boost=1.25 1 exp/mono/1.mdl - |' 'ark:gunzip -c exp/mono/fsts.10.gz|' 'ark,s,cs:apply-cmvn --utt2spk=ark:data/train/split20/10/utt2spk scp:data/train/split20/10/cmvn.scp scp:data/train/split20/10/feats.scp ark:- | add-deltas ark:- ark:- |' 'ark,t:|gzip -c >exp/mono/ali.10.gz'
gmm-boost-silence --boost=1.25 1 exp/mono/1.mdl -
WARNING (gmm-boost-silence:main():gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)
LOG (gmm-boost-silence:main():gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1.25
LOG (gmm-boost-silence:main():gmm-boost-silence.cc:103) Wrote model to -
add-deltas ark:- ark:-
apply-cmvn --utt2spk=ark:data/train/split20/10/utt2spk scp:data/train/split20/10/cmvn.scp scp:data/train/split20/10/feats.scp ark:-
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:143) Retrying utterance aa-ve070829 with beam 24
WARNING (gmm-align-compiled:main():gmm-align-compiled.cc:172) Did not successfully decode file aa-ve070829, len = 4638
....
All audio files (1-2 minutes per utterances) were failed in this step. Could you please help me figure out? Thanks a lot.
you can try either to increase the beam sizes (--beam and --retry-beam) or
split the audio into smaller chunks (if feasible).
y.
On Fri, Dec 19, 2014 at 3:40 PM, Lawrence vjdtao@users.sf.net wrote:
hi,
Maybe the bad/retry wav doesn't contain all the words in the transcript.
In my experience, the bad/retry wavs often stop earlyer, some words at the end of the transcripts are not been recorded.
Feiteng
Thanks, Jan. What the beam size values (--beam and --retry-beam) do you recommend?
It's a great idea to split the long audio, but it'll take time to implement. I saw the librispeech ICASSP paper uses audio segmentation for long speech files, but it seems that the librispeech under the example directory doesn't include that work.
You will have to experiment with it. It is ok if not all of the utterances
will align at this stage. You can also find some utterances that are short
and align well -- for monophone training, not really much data is necessary.
Yes, splitting the audio without any additional information would take
some time to implement -- that's why I was saying "if feasible" :)
y.
On Fri, Dec 19, 2014 at 3:51 PM, Lawrence vjdtao@users.sf.net wrote:
Another solution is to select of subset of the shortest audio files for the first few steps of training (which do not require a lot of data).
Gilles
Le 2014-12-19 à 09:58, Jan Trmal jtrmal@users.sf.net a écrit :
I'm running some experiments for this right now. That will be finished pretty soon.
Guoguo