From: Pekka J. <pek...@tu...> - 2011-12-19 11:19:37
|
On 12/19/2011 12:59 PM, Carlos Sánchez de La Lama wrote: > Also the final binary, optionally. OK. In our case the kernel just might have multiple versions for the multiple dimensions in the .text section. Should work... >> The OpenCL API for fetching and loading the program binaries is >> multi-device. >> Thus the format should not be tied to an architecture as it can >> contain the >> same kernels compiled for multiple devices. > > What does this mean? I think it means that for example in case of AMD you could have the CPU and the GPU (device) versions of the program in the same (OpenCL) binary. I see from the specs that they do not support this but store only the GPU or CPU bits but not both: "By default, OpenCL generates a binary that has LLVM IR, AMD IL, and the executable for the GPU (,.llvmir, .amdil, and .text sections), as well as LLVM IR and the executable for the CPU (.llvmir and .text sections)."? ELF has only one architecture-specific .text section, IIUC so it would not work for this. Anyways, we can add a separate wrapper for the multidevice case on top of this (or use the FatELF) later, if we see need. http://icculus.org/fatelf/ "FatELF lets you pack binaries into one file, seperated by OS ABI, OS ABI version, byte order and word size, and most importantly, CPU architecture." > Any other option (tar/zip/ELF/whatever) would do the same, but as this > is documented and used on a OpenCL SDK I would suggest doing the same. I do not consider the main advantage to be that it's used by AMD. But in case it can be used as a directly dlopenable program binary then it's a real advantage (BTW in MacOS or at least Windows we might need something else then?). It would avoid the objcopy step in case the binary contains a kernel version suitable for launching directly for the given dimensions... probably a small saving but still a nifty thing to have. -- Pekka |