|
From: Joan P. <joa...@gm...> - 2015-04-01 14:44:51
|
Hi, I know Kaldi developers have been advocating for Viterbi training instead of Baum-Welch training for a long time. At least, that is what I get from the documentation and slides that I found on the Internet. However, I need EM training for a couple of reasons and I think it may still be useful for others in some cases. I work on handwriting text recognition and our HMMs are much simpler than those used in ASR (we don't use context-dependent models, for instance). Until today, what I was doing was training my models using HTK, and then converting the HTK models to Kaldi's format. But this is a pain in the ass, because: 1. My script makes strong assumptions that work for me, but could not be true if I change the HMM's topology. 2. It requires to have installed both HTK and Kaldi, and it would be much nicer to have a single tool. 3. HTK does not support a feature that I need during training: using a FST as a "transcription" during training. Kaldi seems to support this (see compile-train-graphs-fsts), although it does not support EM training, and I think that EM would make an important difference for my particular application (*). I have been playing with the Kaldi source code for a while, and I thought I could implement EM-training for Kaldi. But before starting to code anything, I wanted to now if somebody else has worked/is working on that, share my thoughts on how to do this and listen to some ideas or advices from the Kaldi pro developers. First of all, the traditional Baum-Welch recipe for HMM EM-training has to be adapted to work with transition-ids. I have not derived this formally, but the only thing I need is to compute the average number of times each transition-id is traversed. Of course, this is quite easy to do in the case of Viterbi-training, since we only consider the 1-best path and we just need to count the number of times each transition-id is visited. If we could "expand" all possible paths, we would just need to average the count of a particular transition-id in each path, with the posterior probability of that path. Forward-Backward algorithm can do this without the need of "expanding" the transition-id paths. Once the transition-id average counts have been computed, updating the parameters of the model should be easy, since pdf-id, state-id, etc can be recovered from the transition-id. If I'm not wrong, the code in TransitionModel::MleUpdate and MleDiagGmmUpdate should work without any change. (*) Why do I (think I) need EM? Well, in my application the supervised data is scarce and noisy. However, for each input line I have transcriptions coming from different humans. My idea is to encode these multiple transcriptions using a FST, and use EM algorithm training on this FST. I could use Viterbi training as well, but initialization is critical for Viterbi align-equal-compiled is likely to produce a bad alignment in this case. Using EM, I cant take full advantage from the labeling (at least, that is what I hope). In summary: - Does anyone have tried to implement EM in Kaldi? If so, is the code publicly available? - What are you thoughts on this? Cheers, Joan Puigcerver. |