Re: [Kaldi-users] Adapting LibriSpeech model to Blizzard2013 corpus

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> Does that mean that train_more2.sh trains on 9733 - 300 = 9433 utterances,
> where the 300 excluded utterances are those from the valid set?

Yes.

> Further, the decode.sh script accepts the graph directory for the trained
> model, and not the model itself. However, this graph directory hasn't been
> modified since the tri6b model. Even the nnet_a_online and my new model
> which was fine-tuned on Blizzard all use the same graph directory. Does that
> mean that the decoding will be identical on all of these models? Is the
> graph directory the only deciding factor for the decoding's output?

The graph does not contain the acoustic model, the decoding output
also depends on the .mdl file.
Please read this
http://mi.eng.cam.ac.uk/~mjfg/mjfg_NOW.pdf
to understand the basics of speech recognition.

Dan

> On Mon, Jul 6, 2015 at 2:27 PM, Daniel Povey <dp...@gm...> wrote:
>>
>> It ignores the valid data but the train_subset is a subset of the data
>> trained on.  You could reduce the number a bit, e.g. to 150, if you
>> are concerned about losing data.
>> Dan
>>
>>
>> On Mon, Jul 6, 2015 at 8:58 AM, Mate Andre <ele...@gm...> wrote:
>> > I noticed that the egs dumped for the Blizzard corpus contains 300
>> > utterances for each the valid and train subsets, out of a total of 9733
>> > utterances. Does the train_more2.sh script train on all 9733 utterances
>> > of
>> > the dataset, or does it ignore the utterances included in the valid and
>> > train subsets when training?
>> >
>> > On Fri, Jul 3, 2015 at 2:59 PM, Daniel Povey <dp...@gm...> wrote:
>> >>
>> >> It definitely supports that- if you set num-threads to 1 it will train
>> >> with GPU, but read this page
>> >> http://kaldi.sourceforge.net/dnn2.html
>> >>
>> >> Dan
>> >>
>> >> On Fri, Jul 3, 2015 at 6:48 AM, Mate Andre <ele...@gm...>
>> >> wrote:
>> >> > The train_more2.sh has been running for 19 hours and is currently at
>> >> > pass
>> >> > 42/60. Since the script is training on the 19-hour subset of
>> >> > Blizzard, I
>> >> > imagine it'll take quite a while longer to train on the full 300
>> >> > hours.
>> >> >
>> >> > Is there an option to run the train_more2.sh script on GPU?
>> >> >
>> >> > On Thu, Jul 2, 2015 at 2:25 PM, Daniel Povey <dp...@gm...>
>> >> > wrote:
>> >> >>
>> >> >> >
>> >> >> > Back to training on the Blizzard dataset, I was able to dump the
>> >> >> > iVectors
>> >> >> > for Blizzard's 19-hour subset. Where are they needed, though?
>> >> >> > Neither
>> >> >> > train_more2.sh nor get_egs2.sh seem to accept dumped iVectors as
>> >> >> > input.
>> >> >>
>> >> >> It's  the --online-ivector-dir option.
>> >> >>
>> >> >> > Regardless, I ran the train_more2.sh script on Blizzard's data/
>> >> >> > and
>> >> >> > egs/
>> >> >> > folder (generated with get_egs2.sh), and I get the following
>> >> >> > errors
>> >> >> > in
>> >> >> > train.*.*.log:
>> >> >> >
>> >> >> > KALDI_ASSERT: at
>> >> >> > nnet-train-parallel:FormatNnetInput:nnet-update.cc:212,
>> >> >> > failed:
>> >> >> > data[0].input_frames.NumRows() >= num_splice
>> >> >> > [...]
>> >> >> > LOG (nnet-train-parallel:DoBackprop():nnet-update.cc:275) Error
>> >> >> > doing
>> >> >> > backprop, nnet info is: num-components 17
>> >> >> > num-updatable-components 5
>> >> >> > left-context 7
>> >> >> > right-context 7
>> >> >> > input-dim 140
>> >> >> > output-dim 5816
>> >> >> > parameter-dim 10351000
>> >> >> > [...]
>> >> >> >
>> >> >> > The logs tell me that the left and right contexts were set to 7.
>> >> >> > However, I
>> >> >> > specified them both to 3 when running get_egs2.sh. The
>> >> >> > egs/info/{left,right}_context files even confirm that they are set
>> >> >> > to
>> >> >> > 3.
>> >> >> > Is
>> >> >> > it possible that train_more2.sh is using the contexts from another
>> >> >> > directory?
>> >> >>
>> >> >> The problem is that 3 < 7.  The neural net requires a certain amount
>> >> >> of temporal context (7 left and right, here) and if you dump less
>> >> >> than
>> >> >> that in the egs it will crash.  So you need to set them to 7 when
>> >> >> dumping egs.
>> >> >>
>> >> >> Dan
>> >> >>
>> >> >>
>> >> >>
>> >> >> > On Tue, Jun 30, 2015 at 2:07 PM, Daniel Povey <dp...@gm...>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Check the script that generated it; probably the graph directory
>> >> >> >> was
>> >> >> >> in a different location e.g. in tri6 or something like that.
>> >> >> >> Hopefully we would have uploaded that too.
>> >> >> >> We only need to regenerate the graph when the tree changes.
>> >> >> >> Dan
>> >> >> >>
>> >> >> >>
>> >> >> >> On Tue, Jun 30, 2015 at 2:05 PM, Mate Andre
>> >> >> >> <ele...@gm...>
>> >> >> >> wrote:
>> >> >> >> > To ensure that the nnet_a_online model is performing well on
>> >> >> >> > the
>> >> >> >> > 19-hour
>> >> >> >> > Blizzard dataset and that it is producing correct alignments, I
>> >> >> >> > want
>> >> >> >> > to
>> >> >> >> > run
>> >> >> >> > the decoding script on the Blizzard data. However, the
>> >> >> >> > nnet_a_online
>> >> >> >> > model
>> >> >> >> > on kadi-asr.org doesn't seem to have a graph directory needed
>> >> >> >> > for
>> >> >> >> > decoding.
>> >> >> >> > Is there any way I can get a hold of this directory without
>> >> >> >> > training
>> >> >> >> > the
>> >> >> >> > entire model?
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>