From: Carlos S. de La L. <car...@ur...> - 2011-12-19 08:48:51
|
Hi, moved this from the bug reporting comments (better discuss on the list I think). >> How many kernels are cached? E.g. in pthread.c, there is an if statement >> "if (d->current_kernel != kernel)", as if only one kernel was cached. If >> that is so, would that be the right place to introduce a larger cache? Only one right now. Larger cache is needed, I agree. > Yes, Carlos. The cached result depends on the dimensions so saving > multiple versions could work. Although I'd like to cache the final > binary, not only the bitcode, to save all the compilation costs. This > will be useful especially in the future when TCE is used in a proper > host-device configuration, in the embedded/mobile systems that really > want to save all useless work, and also in my planned research wrt. > OpenCL to FPGA. So we might need to create a new simple binary format > with multiple target binaries inside + some metadata (for example to > save the dimensions). I think the best way is, instead of defining a binary format (I was originally thinking on a ELF with different binaries in different sections, like AMD SDK does) it is probably better just to use the BC. The workgroup function would be created a different name (probably including the dimensions, so the info is there) and the OpenCL-related metadata is not touched so it still points to the original kernel. The passes need slight modifications to handle this, but if we update at that point the "binary image" of the kernel, then we would have what we want. Drawbacks would be: 1) Little unneeded delay with the caching code in case no caching is wanted. 2) The binary code might grow quite big. It would be nice if there was a way to enable/disable the caching, more or less complying to the standard. Is there a way to define host-side extensions? Carlos |