Re: [ViennaCL-devel] Fwd: Re: Computing A += prod(B,C)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Philippe,

thanks for the investigations.

> kernel 1, device 1 : C(0,0) = A(0,0) * B(0,0)
> kernel 2, device 2 : C(0,1) = A(0,0) * B(0,1),
> Both the AMD and the NVidia SDK are unable to multicast A(0,0) from the
> host to the two GPUs. Even if the two kernels are enqued in parallel,
> the execution is serialized, because the 2nd device has to wait for
> A(0,0) to be available. This is exactly the behavior I feared.
> It does not happen with a simple matrix addition, where all the handles
> are independant.

Okay, I see, so the const-qualifiers for the kernel handles are ignored 
(or not abused for a more efficient implementation). Thus, it seems like 
we have to use separate memory handles in such case and that we better 
attach some meta-information ('current device') to each memory handle.

> I'm desesperately looking for a low-memory handle multicasting. I might
> give the Khronos forum a try, even though enqueuing the same handle on
> different queues is left implementation-defined by the standards!

Oh dear, 'implementation-defined' is nothing I want to see at this point 
:-( Seems like we should perhaps reconsider using one context per device 
and benchmark memory transfers for the two options (i.e. one context for 
all devices vs. one context per device).

> But well, the good news is that the kernels are executing!

Yep, some good news :-)

Best regards,
Karli

>
> 2012/7/31 Karl Rupp <ru...@iu... <mailto:ru...@iu...>>
>
>       Hello again,
>
>       I've justed pushed the following changes to the sourceforge-repository:
>       * operator+= and operator-= no longer create temporaries
>       * A = prod(B,C) does not fail if there is garbage in A
>
>       Best regards,
>       Karli
>
>
>
>       On 07/29/2012 03:59 PM, Philippe Tillet wrote:
>
>           Hello everybody !
>
>           I'll inaugurate this mailing list with a little question.
>           I have not seen any kernel for computing the operation A +=
>           prod(B,C) .
>           Does this mean that this operation is done doing :
>
>           tmp = prod(B,C)
>           a+=tmp
>
>           ?
>
>           For computing the multi_matrix ( project i'm working on, matrix
>           composed
>           of multiple handles, to solve the CL_MAX_ALLOCABLE_MEMORY and
>           the multi
>           devices issue), I need to do several updates of this kind, in a
>           block
>           layout. For a 2*2 block layout :
>
>           C(0,0).clear();
>           =>
>           C(0,0) += prod( A(0,0), B(0,0) )
>           =>
>           C(0,0) += prod( A(0,1), B(1,0) )
>
>           C(0,1).clear();
>           =>
>           C(0,1) += prod( A(0,0), B(0,1) )
>           =>
>           C(0,1) += prod( A(0,1), B(1,1) )
>
>           ...
>           ...
>
>           This "sort-of-rank-1-update approach" is a special case of the
> SUMMA
>           Algorithm (OpenCL doing the memory transfers in the back ground,
>           for now
>           at least) and seems to be efficient from a memory point of view.
>           Using
>           another approach would lead to both a huge memory consumption and
>           significant memory transfers...
>
>           Is there any way of doing so in ViennaCL ?
>
>           Best regards !
>           Phil
>
>
>
> ------------------------------__------------------------------__------------------
>           Live Security Virtual Conference
>           Exclusive live event will cover all the ways today's security and
>           threat landscape has changed and how IT managers can respond.
>           Discussions
>           will include endpoint security, mobile security and the latest
>           in malware
>           threats.
>           http://www.accelacomm.com/jaw/__sfrnl04242012/114/50122263/
>           <http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/>
>
>
>
>           _________________________________________________
>           ViennaCL-devel mailing list
>           ViennaCL-devel@lists.__sourceforge.net
>           <mailto:Vie...@li...>
>           https://lists.sourceforge.net/__lists/listinfo/viennacl-devel
>           <https://lists.sourceforge.net/lists/listinfo/viennacl-devel>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> ViennaCL-devel mailing list
> Vie...@li...
> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>

Re: [ViennaCL-devel] Fwd: Re: Computing A += prod(B,C)

Linear algebra and solver library using CUDA, OpenCL, and OpenMP

Re: [ViennaCL-devel] Fwd: Re: Computing A += prod(B,C)