|
From: Pekka J. <pek...@tu...> - 2012-01-20 16:41:42
|
OK, I implemented the non-strided versions in r159 using a trivial for-loop. I didn't test them yet (the spmv book example case that uses the functions fails with the bug regarding the __local pointers). Feel free to modify or comment. The easiest optimization for the implementation is to define versions that use vector loads and stores for the vector gentypes for the architectures with SIMD loads/stores (e.g. x86_64 with avx/sse). Have a nice weekend! On 01/20/2012 05:14 PM, Erik Schnetter wrote: > Pekka > > I thought about using include files to instantiate macros, but so far, > I've kept things to using macros only. Instead, there is a different > macro for each "kind" of prototype. We can change this to using > #include instead (which would probably reduce the amount of code), but > this would also make it more complex to instantiate macros -- this > would then require a #define, and #include, and an #undef for each > function. > > I would instead #define a new macro just for async_work_group_copy in > _kernel.h. There would probably be another specific macro in > templates.h to help instantiating the function definitions. This would > be similar to the vload/vstore functions. > > This is similar to what you suggest except it doesn't use an #include > file. There is no particular reason not to use an #include file, > except that we currently don't. If you think it significantly > simplifies the code, then do it. > > -erik > > 2012/1/20 Pekka Jääskeläinen<pek...@tu...>: >> Erik, >> >> The function protototype for the async copy includes a "gentype" >> in a position not supported by the current "generator macros" of >> yours. >> >> event_t async_work_group_copy ( >> __local gentype *dst, >> const __global gentype *src, >> size_t num_gentypes, >> event_t event); >> >> What do you think is best way to generate the declarations and >> definitions for such functions? >> >> Something like: >> >> #define __FUNC_PROTO(gentype) \ >> __attribute__ ((overloadable)) \ >> event_t async_work_group_copy ( \ >> __local gentype *dst, \ >> const __global gentype *src, \ >> size_t num_gentypes, \ >> event_t event) \ >> >> #include "gentype_func_decl.inc" >> >> Then that .inc would have the macro instantiated >> with all the different value types for gentype. E.g.: >> >> __FUNC_PROTO(float); >> __FUNC_PROTO(float2); >> __FUNC_PROTO(float4); >> ... >> >> Similarly for the definitions. Here I think we >> can assume both of the gentypes in the function >> are always the same so we do not have to generate >> all combinations. >> >> What do you think? >> >> -- >> Pekka >> >> ------------------------------------------------------------------------------ >> Keep Your Developer Skills Current with LearnDevNow! >> The most comprehensive online learning library for Microsoft developers >> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >> Metro Style Apps, more. Free future releases when you subscribe now! >> http://p.sf.net/sfu/learndevnow-d2d >> _______________________________________________ >> Pocl-devel mailing list >> Poc...@li... >> https://lists.sourceforge.net/lists/listinfo/pocl-devel > > > -- Pekka |