Re: [ViennaCL-devel] Other BLAS backends

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

my first thought was that OpenCL on the CPU would be *the* cool thing. 
However, running some benchmarks soon showed that OpenCL won't be the 
best choice for everything. The most annoying thing is the kernel launch 
overhead, which is even on the CPU in the range of 10 us. Even for a 
moderate 1 GHz CPU, this translates to 10k CPU cycles 'wasted'. Thus, 
for 'small' operations OpenCL won't give any good performance... :-(

I agree that the current OpenCL backend requires extensions in order to 
handle device-specific implementations. As a (recent) prominent example, 
the super-fast matrix-matrix multiplication kernels for NVIDIA GPUs are 
not optimal for AMD GPUs and even for some older NVIDIA GPUs. The OpenCL 
backend, however, is essentially independent of the user types, so the 
ideal design in my view is the following:
  Layer 1:   User API types (viennacl::vector<>, etc.)
  Layer 2:   BLAS calling code (OpenCL, SSE, etc. Maybe hybrid?)
  Layer 3:   BLAS backend details (OpenCL kernel management, etc.)
The VIENNACL_USE_XYZ_BLAS defines would basically select the Layer 2 
implementation to be used, while the better OpenCL kernel management 
with possibly several tuned kernels resides entirely in Layer 3. In a 
first step I think it's better to select Layer 2 statically via 
preprocessor defines. In a second step we might even use hybrid 
approaches by using SSE for small operations on OpenCL handles on APUs, 
thus circumventing the ugly kernel launch overheads in such cases. (Just 
as a remark: The scientific community is right now *not* interested in 
APUs due to the lack of double precision support. Due to thermal 
limitations I don't think an APU will beat a standalone GPU anytime 
soon, even though the latency problems introduced by PCI-Express are 
obvious.)

In other words: Since there does not seem to be a single 'best 
programming approach', let's combine the best of different approaches. :-)

Best regards,
Karli

On 08/01/2012 12:33 PM, Philippe Tillet wrote:
> Hi everybody !
>
> My personal opinion about that is that the capabilities on OpenCL on the
> CPU should not be overlooked. Intel is also putting a lot of efforts
> into getting strong OpenCL tools!
> I wonder if CPU-Optimized kernels would not be able to beat an SSE
> implementation.
> Plus, in my opinion, we are tending to a future where everybody will
> have an OpenCL-capable GPU,(as of today, the Intel HD Graphics of the
> Ivy Bridge is OpenCL-capable on windows, and AMD APUs also have 2
> devices recognized on both Windows and Linux).
>
> Therefore, wouldn't it be better to focus on optimizing some kernels for
> the CPUs, and letting the implementation redirect the computation to the
> proper kernel?
> Plus, it would be a chance to write an API for dealing with
> platform-specific kernels, and therefore give us the possibility in the
> future to optimize some kernels for either AMD CPU, AMD GPU, Intel,
> NVidia GPU...!
>
> Best regards,
> Philippe
>
>
> 2012/8/1 Karl Rupp <ru...@iu... <mailto:ru...@iu...>>
>
>     Hi Alex,
>
>     I've spent some more thoughts on how to separate the linear algebra
>     backends suitably. Currently, some OpenCL statements are mixed into the
>     vector<> and matrix<> classes, while the operations are clearly
>     separated via calls to externally defined functions (e.g. prod_impl()),
>     cf. vector_operations.hpp and matrix_operations.hpp.
>
>     To simplify your development efforts I could continue this separation
>     and also move initialization routines to separate header files. In the
>     best case, all that is necessary for a CPU-only fallback is to have e.g.
>     in vector.hpp something like
>
>     #ifdef VIENNACL_NO_OPENCL
>        #include "viennacl/linalg/vector-operations-cpu.hpp"
>     #else
>        #include "viennacl/linalg/vector-operations-opencl.hpp"
>     #endif
>
>     Going one step further, we could even separate the convenience types
>     from the BLAS backend and support something like
>
>     #if defined VIENNACL_USE_SSE_BLAS
>        #include "viennacl/linalg/vector-operations-sse.hpp"
>     #elif defined VIENNACL_USE_OPENCL_BLAS
>        #include "viennacl/linalg/vector-operations-opencl.hpp"
>     #elif defined VIENNACL_USE_OPENMP_BLAS
>        #include "viennacl/linalg/vector-operations-openmp.hpp"
>     ...
>     #else
>        #include "viennacl/linalg/vector-operations-fallback.hpp"
>     #endif
>
>     We probably won't have the development resources for supporting a whole
>     zoo of different backends, yet I like the idea of a clean separation.
>     What do you think?
>
>     Best regards,
>     Karli
>
>     PS: cc'ed to viennacl-devel
>
>
>
>
>     On 07/29/2012 06:49 AM, Alex Christensen wrote:
>      > I made tred2 not copy memory, and it works with ublas matrices.
>       My goal
>      > is to make a backend so that defining VIENNACL_NO_OPENCL makes
>     existing
>      > code work without a gpu (or even linking to an OpenCL library).  I'll
>      > let you know if I run into any problems.  Hopefully the existing
>     QR code
>      > will work with that.
>      >
>      > Since the LU routines don't do partial pivoting, should I include
>     my cpu
>      > LU function with partial pivoting?  Should I include my cholesky
>      > function also, maybe as a separate header?  The only cholesky
>     function I
>      > have found in ViennaCL is in spai.
>      >
>      > Alex
>      >
>
>
>     ------------------------------------------------------------------------------
>     Live Security Virtual Conference
>     Exclusive live event will cover all the ways today's security and
>     threat landscape has changed and how IT managers can respond.
>     Discussions
>     will include endpoint security, mobile security and the latest in
>     malware
>     threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>     _______________________________________________
>     ViennaCL-devel mailing list
>     Vie...@li...
>     <mailto:Vie...@li...>
>     https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
>
> _______________________________________________
> ViennaCL-devel mailing list
> Vie...@li...
> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>

Re: [ViennaCL-devel] Other BLAS backends

Linear algebra and solver library using CUDA, OpenCL, and OpenMP

Re: [ViennaCL-devel] Other BLAS backends