[Kaldi-developers] Baum-Welch training for HMMs

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

I know Kaldi developers have been advocating for Viterbi training
instead of Baum-Welch training for a long time. At least, that is what
I get from the documentation and slides that I found on the Internet.
However, I need EM training for a couple of reasons and I think it may
still be useful for others in some cases.

I work on handwriting text recognition and our HMMs are much simpler
than those used in ASR (we don't use context-dependent models, for
instance). Until today, what I was doing was training my models using
HTK, and then converting the HTK models to Kaldi's format. But this is
a pain in the ass, because:

1. My script makes strong assumptions that work for me, but could not
be true if I change the HMM's topology.
2. It requires to have installed both HTK and Kaldi, and it would be
much nicer to have a single tool.
3. HTK does not support a feature that I need during training: using a
FST as a "transcription" during training. Kaldi seems to support this
(see compile-train-graphs-fsts), although it does not support EM
training, and I think that EM would make an important difference for
my particular application (*).

I have been playing with the Kaldi source code for a while, and I
thought I could implement EM-training for Kaldi. But before starting
to code anything, I wanted to now if somebody else has worked/is
working on that, share my thoughts on how to do this and listen to
some ideas or advices from the Kaldi pro developers.

First of all, the traditional Baum-Welch recipe for HMM EM-training
has to be adapted to work with transition-ids. I have not derived this
formally, but the only thing I need is to compute the average number
of times each transition-id is traversed. Of course, this is quite
easy to do in the case of Viterbi-training, since we only consider the
1-best path and we just need to count the number of times each
transition-id is visited. If we could "expand" all possible paths, we
would just need to average the count of a particular transition-id in
each path, with the posterior probability of that path.
Forward-Backward algorithm can do this without the need of "expanding"
the transition-id paths.

Once the transition-id average counts have been computed, updating the
parameters of the model should be easy, since pdf-id, state-id, etc
can be recovered from the transition-id. If I'm not wrong, the code in
TransitionModel::MleUpdate and MleDiagGmmUpdate should work without
any change.

(*) Why do I (think I) need EM? Well, in my application the supervised
data is scarce and noisy. However, for each input line I have
transcriptions coming from different humans. My idea is to encode
these multiple transcriptions using a FST, and use EM algorithm
training on this FST. I could use Viterbi training as well, but
initialization is critical for Viterbi align-equal-compiled is likely
to produce a bad alignment in this case. Using EM, I cant take full
advantage from the labeling (at least, that is what I hope).

In summary:
- Does anyone have tried to implement EM in Kaldi? If so, is the code
publicly available?
- What are you thoughts on this?

Cheers,
Joan Puigcerver.