From: Carlos S. de La L. <car...@ur...> - 2011-10-24 10:46:00
|
Hi all, I have been thinking about how to implement the kernel library for different devices, and some related issues. Right now, pocl flow goes like this: Compilation: .cl to .bc (bytecode) | V Linking with kerneĺ library | V Fully inlining | V Workgroup creation (replicate workitems) | V Device-dependant driver Workgroup creation needs to detect barriers, thats why it needs to be done after fully inlining (there can be barriers in a function called by the kernel, not in the kernel itself). One desirable thing is bytecode to be device independent as long as possible, until device driver if possible, so we do not have to store several binaries in the host (there might be some unavoidable dependencies, but I think given OpenCL restricted C support those will be minor). Then there are two possibilities: 1) Make kernel library runtime compatible with all devices. This was the planned approach, it can be done by selection the implementation for a device using runtime conditional (C-level ifs) instead of preprocessor ones (#if/#ifdefs). LLVM should then eliminate dead code when generating the final binary. 2) Perform inlining and replication before linking. Only a minor part of the kernel library (get_xxx_id() and friends) need to be linked before WG creation, and those are going to be common for all device because they depend on replication passes. But the big "functional" kernel runtime library could be linked later, even in device-dependant binary form instead of bytecode form, allowing the use of different kernel libraries for different devices. This would have the additional advantage of smaller bytecode and faster code generation. Thoughs? Carlos |