|
From: Erik S. <esc...@pe...> - 2012-01-20 17:06:00
|
Pekka 3-element vectors can't be accessed directly in memory, because they need to be aligned the same was as 4-vectors. One needs to use vload/vstore for this, which doesn't require alignment. I assume that 3-element vectors would be stored packed in global memory. I don't know whether this is what async_copy actually expects... However, un-packed storage can be handled by passing a respective 4-element vector as gentype. -erik 2012/1/20 Pekka Jääskeläinen <pek...@tu...>: > OK, > > I implemented the non-strided versions in r159 using a trivial for-loop. > I didn't test them yet (the spmv book example case that uses the functions > fails with the bug regarding the __local pointers). Feel free to > modify or comment. > > The easiest optimization for the implementation is to define > versions that use vector loads and stores for the vector gentypes > for the architectures with SIMD loads/stores (e.g. x86_64 with avx/sse). > > Have a nice weekend! > > > On 01/20/2012 05:14 PM, Erik Schnetter wrote: >> >> Pekka >> >> I thought about using include files to instantiate macros, but so far, >> I've kept things to using macros only. Instead, there is a different >> macro for each "kind" of prototype. We can change this to using >> #include instead (which would probably reduce the amount of code), but >> this would also make it more complex to instantiate macros -- this >> would then require a #define, and #include, and an #undef for each >> function. >> >> I would instead #define a new macro just for async_work_group_copy in >> _kernel.h. There would probably be another specific macro in >> templates.h to help instantiating the function definitions. This would >> be similar to the vload/vstore functions. >> >> This is similar to what you suggest except it doesn't use an #include >> file. There is no particular reason not to use an #include file, >> except that we currently don't. If you think it significantly >> simplifies the code, then do it. >> >> -erik >> >> 2012/1/20 Pekka Jääskeläinen<pek...@tu...>: >>> >>> Erik, >>> >>> The function protototype for the async copy includes a "gentype" >>> in a position not supported by the current "generator macros" of >>> yours. >>> >>> event_t async_work_group_copy ( >>> __local gentype *dst, >>> const __global gentype *src, >>> size_t num_gentypes, >>> event_t event); >>> >>> What do you think is best way to generate the declarations and >>> definitions for such functions? >>> >>> Something like: >>> >>> #define __FUNC_PROTO(gentype) \ >>> __attribute__ ((overloadable)) \ >>> event_t async_work_group_copy ( \ >>> __local gentype *dst, \ >>> const __global gentype *src, \ >>> size_t num_gentypes, \ >>> event_t event) \ >>> >>> #include "gentype_func_decl.inc" >>> >>> Then that .inc would have the macro instantiated >>> with all the different value types for gentype. E.g.: >>> >>> __FUNC_PROTO(float); >>> __FUNC_PROTO(float2); >>> __FUNC_PROTO(float4); >>> ... >>> >>> Similarly for the definitions. Here I think we >>> can assume both of the gentypes in the function >>> are always the same so we do not have to generate >>> all combinations. >>> >>> What do you think? >>> >>> -- >>> Pekka >>> >>> >>> ------------------------------------------------------------------------------ >>> Keep Your Developer Skills Current with LearnDevNow! >>> The most comprehensive online learning library for Microsoft developers >>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >>> Metro Style Apps, more. Free future releases when you subscribe now! >>> http://p.sf.net/sfu/learndevnow-d2d >>> _______________________________________________ >>> Pocl-devel mailing list >>> Poc...@li... >>> https://lists.sourceforge.net/lists/listinfo/pocl-devel >> >> >> >> > > > -- > Pekka -- Erik Schnetter <esc...@pe...> http://www.cct.lsu.edu/~eschnett/ AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... |