Menu

Parallelisation of RNNLM

Developers
JHL
2014-10-31
2014-10-31
  • JHL

    JHL - 2014-10-31

    Hi,

    I have a question on the parallel implementation of RNNLM by kaldi (http://svn.code.sf.net/p/kaldi/code/trunk/tools/rnnlm-hs-0.1b/)

    I read the code and based on my understanding, each thread is made to process their sentences (based on its thread id), and each thread has its own activation and error values of neurons. Am I right to say that the synapse weights are shared across the threads? But these weights are updated concurrently by different threads, would there not be any update conflicts? Sorry if I have misunderstood the whole parallelisation algorithm completely.

     
    • Daniel Povey

      Daniel Povey - 2014-10-31

      This is not really part of Kaldi per se, but an external tool.
      I think Ilya (cc'd) might be able to answer your question.
      Dan

      On Fri, Oct 31, 2014 at 12:34 AM, JHL depthcharge101@users.sf.net wrote:

      Hi,

      I have a question on the parallel implementation of RNNLM by kaldi (
      http://svn.code.sf.net/p/kaldi/code/trunk/tools/rnnlm-hs-0.1b/)

      I read the code and based on my understanding, each thread is made to
      process their sentences (based on its thread id), and each thread has its
      own activation and error values of neurons. Am I right to say that the
      synapse weights are shared across the threads? But these weights are
      updated concurrently by different threads, would there not be any update
      conflicts? Sorry if I have misunderstood the whole parallelisation
      algorithm completely.


      Parallelisation of RNNLM
      https://sourceforge.net/p/kaldi/discussion/1355349/thread/e9774bf3/?limit=25#31f1


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355349/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
    • Ilya Edrenkin

      Ilya Edrenkin - 2014-10-31

      Hi JHL,

      You are totally right: the parallelization method is asynchronous and lock-free ( http://www.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf ). This can of course lead to problems (in fact, this is a constant data race), some updates will inevitably be lost. But if the learning rate is not too high, it will eventually converge.

      Regarding the performance: this version indeed performs worse than Tomas Mikolov's RNNLM both in terms of WER and entropy. However, with RNNLM-HS you can allow yourself to use larger model size, to certain extent, and to use a larger amount of training data.
      Tomas gave me two hints how I could make it better:
      1) Improve the hashing mechanism in ME part.
      2) Rework the output layer decomposition.

      One of the problems with the ME is that features of high order that were not really learned are still used in the test stage. I will send a patch soon that deals with it. It does help, but only marginally.

      The output layer can be rethought as well. As you mentioned in the letter, Tomas uses class-based decomposition; RNNLM-HS uses binary tree. So the depth of the trees is correspondingly 2 and log_2(|V|). In fact something between these two extremes should probably provide a compromise between speed and accuracy. For example, we can try using k-leaf Huffman decomposition (if k ==3 it will give depth of log_3(V) and still inexpensive 3-way softmaxes on each node), as well as using a number of output layers (2-huffman, 3-huffman, ..., up to arbitrary k) and interpolating between them.
      However, this is a considerable body of work, I'm afraid I will only have time for it January.

      Another problem that you could probably note if you used sampling with ME is a bug in sample() function -- if ME is used, the indices are off by one. I will fix it in the coming patch.

       

      Last edit: Ilya Edrenkin 2014-10-31
  • JHL

    JHL - 2014-10-31

    Hi Ilya,

    Thanks for the detailed explanations, it is certainly very helpful. I might actually just extend Tomas' rnnlm using the hogwild trick, seeing that it is a rather simple parallelisation trick. This way I'll have two versions of parallelised rnnlm (rnnlm-hs and rnnlm+hogwild) and I'd have the option of choosing between accuracy vs. speed depending on the dataset.