Re: [Kaldi-developers] Baum-Welch training for HMMs

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> I know Kaldi developers have been advocating for Viterbi training
> instead of Baum-Welch training for a long time. At least, that is what
> I get from the documentation and slides that I found on the Internet.
>

Kaldi training is based on Viterbi because it's more efficient, no worse
than E-M (for speech applications), and much easier to integrate with FSTs.

Personally I suspect that you would not get any improvement from doing E-M
as opposed to Viterbi-- the posteriors tend to be pretty peaky anyway.  If
you are concerned about the randomness of initialization you could always
duplicate your training examples several times, so several different random
paths will be taken.  But I think this will make no difference.  Also, E-M
training would be at least ten times slower- probably closer to 100 times
slower depending what tricks like pruning you know how to implement.

If you really did want to do E-M training, the way to do this would
probably to implement, instead of Viterbi, some kind of forward-backward
algorithm that would directly output posteriors over transition-ids.  This
would create a difficulty for converting alignments though (this happens
when bootstrapping later systems, e.g. starting tri2 from tri1).  You would
probably have to just to Viterbi for that one stage.  You'd want to either
store the posteriors on disk (maybe pruned a bit), or pipe them into
stats-accumulation programs.

I'm not aware that anyone has done this.

Dan

> However, I need EM training for a couple of reasons and I think it may
> still be useful for others in some cases.
>
> I work on handwriting text recognition and our HMMs are much simpler
> than those used in ASR (we don't use context-dependent models, for
> instance). Until today, what I was doing was training my models using
> HTK, and then converting the HTK models to Kaldi's format. But this is
> a pain in the ass, because:
>
> 1. My script makes strong assumptions that work for me, but could not
> be true if I change the HMM's topology.
> 2. It requires to have installed both HTK and Kaldi, and it would be
> much nicer to have a single tool.
> 3. HTK does not support a feature that I need during training: using a
> FST as a "transcription" during training. Kaldi seems to support this
> (see compile-train-graphs-fsts), although it does not support EM
> training, and I think that EM would make an important difference for
> my particular application (*).
>
> I have been playing with the Kaldi source code for a while, and I
> thought I could implement EM-training for Kaldi. But before starting
> to code anything, I wanted to now if somebody else has worked/is
> working on that, share my thoughts on how to do this and listen to
> some ideas or advices from the Kaldi pro developers.
>
> First of all, the traditional Baum-Welch recipe for HMM EM-training
> has to be adapted to work with transition-ids. I have not derived this
> formally, but the only thing I need is to compute the average number
> of times each transition-id is traversed. Of course, this is quite
> easy to do in the case of Viterbi-training, since we only consider the
> 1-best path and we just need to count the number of times each
> transition-id is visited. If we could "expand" all possible paths, we
> would just need to average the count of a particular transition-id in
> each path, with the posterior probability of that path.
> Forward-Backward algorithm can do this without the need of "expanding"
> the transition-id paths.
>
> Once the transition-id average counts have been computed, updating the
> parameters of the model should be easy, since pdf-id, state-id, etc
> can be recovered from the transition-id. If I'm not wrong, the code in
> TransitionModel::MleUpdate and MleDiagGmmUpdate should work without
> any change.
>
> (*) Why do I (think I) need EM? Well, in my application the supervised
> data is scarce and noisy. However, for each input line I have
> transcriptions coming from different humans. My idea is to encode
> these multiple transcriptions using a FST, and use EM algorithm
> training on this FST. I could use Viterbi training as well, but
> initialization is critical for Viterbi align-equal-compiled is likely
> to produce a bad alignment in this case. Using EM, I cant take full
> advantage from the labeling (at least, that is what I hope).
>
> In summary:
> - Does anyone have tried to implement EM in Kaldi? If so, is the code
> publicly available?
> - What are you thoughts on this?
>
> Cheers,
> Joan Puigcerver.
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Kaldi-developers mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>