Kaldi / Discussion / Developers: Parallelisation of RNNLM

JHL - 2014-10-31

Hi,

I have a question on the parallel implementation of RNNLM by kaldi (http://svn.code.sf.net/p/kaldi/code/trunk/tools/rnnlm-hs-0.1b/)

I read the code and based on my understanding, each thread is made to process their sentences (based on its thread id), and each thread has its own activation and error values of neurons. Am I right to say that the synapse weights are shared across the threads? But these weights are updated concurrently by different threads, would there not be any update conflicts? Sorry if I have misunderstood the whole parallelisation algorithm completely.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Povey - 2014-10-31
  
  This is not really part of Kaldi per se, but an external tool.
  I think Ilya (cc'd) might be able to answer your question.
  Dan
  
  On Fri, Oct 31, 2014 at 12:34 AM, JHL depthcharge101@users.sf.net wrote:
  
  Hi,
  
  I have a question on the parallel implementation of RNNLM by kaldi (
  http://svn.code.sf.net/p/kaldi/code/trunk/tools/rnnlm-hs-0.1b/)
  
  I read the code and based on my understanding, each thread is made to
  process their sentences (based on its thread id), and each thread has its
  own activation and error values of neurons. Am I right to say that the
  synapse weights are shared across the threads? But these weights are
  updated concurrently by different threads, would there not be any update
  conflicts? Sorry if I have misunderstood the whole parallelisation
  algorithm completely.
  
  Parallelisation of RNNLM
  https://sourceforge.net/p/kaldi/discussion/1355349/thread/e9774bf3/?limit=25#31f1
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/kaldi/discussion/1355349/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  alternate
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Ilya Edrenkin - 2014-10-31
  
  Hi JHL,
  
  You are totally right: the parallelization method is asynchronous and lock-free ( http://www.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf ). This can of course lead to problems (in fact, this is a constant data race), some updates will inevitably be lost. But if the learning rate is not too high, it will eventually converge.
  
  Regarding the performance: this version indeed performs worse than Tomas Mikolov's RNNLM both in terms of WER and entropy. However, with RNNLM-HS you can allow yourself to use larger model size, to certain extent, and to use a larger amount of training data.
  Tomas gave me two hints how I could make it better:
  1) Improve the hashing mechanism in ME part.
  2) Rework the output layer decomposition.
  
  One of the problems with the ME is that features of high order that were not really learned are still used in the test stage. I will send a patch soon that deals with it. It does help, but only marginally.
  
  The output layer can be rethought as well. As you mentioned in the letter, Tomas uses class-based decomposition; RNNLM-HS uses binary tree. So the depth of the trees is correspondingly 2 and log_2(|V|). In fact something between these two extremes should probably provide a compromise between speed and accuracy. For example, we can try using k-leaf Huffman decomposition (if k ==3 it will give depth of log_3(V) and still inexpensive 3-way softmaxes on each node), as well as using a number of output layers (2-huffman, 3-huffman, ..., up to arbitrary k) and interpolating between them.
  However, this is a considerable body of work, I'm afraid I will only have time for it January.
  
  Another problem that you could probably note if you used sampling with ME is a bug in sample() function -- if ME is used, the indices are off by one. I will fix it in the coming patch.
  
  Last edit: Ilya Edrenkin 2014-10-31
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

JHL - 2014-10-31

Hi Ilya,

Thanks for the detailed explanations, it is certainly very helpful. I might actually just extend Tomas' rnnlm using the hogwild trick, seeing that it is a rather simple parallelisation trick. This way I'll have two versions of parallelised rnnlm (rnnlm-hs and rnnlm+hogwild) and I'd have the option of choosing between accuracy vs. speed depending on the dataset.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Parallelisation of RNNLM

Forums

Help

Parallelisation of RNNLM

Parallelisation of RNNLM

Forums

Help

Parallelisation of RNNLM document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Parallelisation of RNNLM