Re: [Kaldi-developers] Query on GPU batching

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

>> I'm not too concerned about this as it
>> seems to me that the matrix-multiply will be much slower than the
>> softmax (as it's O(n^3) not O(n^2)), and therefore the small penalty
>> from doing them separately does not matter relative to the possibly
>> large performance gain from the faster matrix multiply.
>
>
> Certainly the problem goes away when n is large.   But do we have large n?
> This implies large minibatch sizes which may slow down the weight updates
> and certainly take quite a lot of GPU memory. Assuming about 2GB RAM there
> is a limit to the size you can make minibatches.

The factor by which matmul is slower than softmax is anyway not the
minibatch size, it's the size of the nonlinear layers, I think.

> I guess we can ask the question in the other way:  does anyone have any
> profile information to share?   That is, what GPU utilisation does Kaldi
> achieve?   Clearly if it's currently getting over (say) 50% then there is no
> point in thinking about this any more.   As it is, my main concern is
> satisfied, I was just looking in the wrong place.

I'm not sure.  Karel would know (for his codebase); mine is still in
the early stages (I'm still fixing issues with it).

Dan

> Tony
>
> --
> Dr A J Robinson, Founder and Director of Cantab Research Limited.
> St Johns Innovation Centre, Cowley Road, Cambridge, CB4 0WS, UK.
> Company reg no 05697423 (England and Wales), VAT reg no 925606030.