From: Karl R. <ru...@iu...> - 2012-08-02 12:48:06
|
Hi, > I think that's a good way to go, except that I think the SSE and OpenMP > BLAS implementations shouldn't be separate. I'm a little bit > intimidated because all the OpenCL code would have to be translated to > CPU code in order for a CPU backend to have full functionality. This > might not be done anytime soon. That's a point, keeping SSE and OpenMP together makes sense. Not all OpenCL kernels need to be translated to SSE/OpenMP right away. Most of the operations can be handled with simple loops, possibly even in a generative way (e.g. templates). It's sufficient if you focus on the 'interesting' kernels, I can add the simpler kernels/operatins as well. > Also, do you think my sse blas and tred2 will be included in the next > release? When is the next release? The next release is expected to be next week, version 1.3.1. This is going to be a bugfix release and further stabilizes some of the new experimental features. I hope to include your SSE contributions in 1.4.0, which will also include the developments from the Google Summer of Code (generalized eigenvalue problems) and is expected to be in the second half of September. This is, however, not set in stone - as university courses usually start at this time we better bring all summer developments to a stable state. :-) Best regards, Karli > PS: reply cc'ed to viennacl-devel :-) > > On Wed, Aug 1, 2012 at 3:38 AM, Karl Rupp <ru...@iu... > <mailto:ru...@iu...>> wrote: > > Hi Alex, > > I've spent some more thoughts on how to separate the linear algebra > backends suitably. Currently, some OpenCL statements are mixed into > the vector<> and matrix<> classes, while the operations are clearly > separated via calls to externally defined functions (e.g. > prod_impl()), cf. vector_operations.hpp and matrix_operations.hpp. > > To simplify your development efforts I could continue this > separation and also move initialization routines to separate header > files. In the best case, all that is necessary for a CPU-only > fallback is to have e.g. in vector.hpp something like > > #ifdef VIENNACL_NO_OPENCL > #include "viennacl/linalg/vector-__operations-cpu.hpp" > #else > #include "viennacl/linalg/vector-__operations-opencl.hpp" > #endif > > Going one step further, we could even separate the convenience types > from the BLAS backend and support something like > > #if defined VIENNACL_USE_SSE_BLAS > #include "viennacl/linalg/vector-__operations-sse.hpp" > #elif defined VIENNACL_USE_OPENCL_BLAS > #include "viennacl/linalg/vector-__operations-opencl.hpp" > #elif defined VIENNACL_USE_OPENMP_BLAS > #include "viennacl/linalg/vector-__operations-openmp.hpp" > ... > #else > #include "viennacl/linalg/vector-__operations-fallback.hpp" > #endif > > We probably won't have the development resources for supporting a > whole zoo of different backends, yet I like the idea of a clean > separation. What do you think? > > Best regards, > Karli > > PS: cc'ed to viennacl-devel > > > > > On 07/29/2012 06:49 AM, Alex Christensen wrote: > > I made tred2 not copy memory, and it works with ublas matrices. > My goal > is to make a backend so that defining VIENNACL_NO_OPENCL makes > existing > code work without a gpu (or even linking to an OpenCL library). > I'll > let you know if I run into any problems. Hopefully the existing > QR code > will work with that. > > Since the LU routines don't do partial pivoting, should I > include my cpu > LU function with partial pivoting? Should I include my cholesky > function also, maybe as a separate header? The only cholesky > function I > have found in ViennaCL is in spai. > > Alex > > > |