From: Pekka J. <pek...@tu...> - 2011-12-15 08:06:13
|
On 12/15/2011 01:09 AM, Erik Schnetter wrote: > Erik Schnetter has proposed merging lp:~schnetter/pocl/main into lp:pocl. > > Requested reviews: pocl maintaners (pocl) > > For more details, see: > https://code.launchpad.net/~schnetter/pocl/main/+merge/85761 > > I added support for the half datatype, protected by #ifdef cl_khr_fp16, > analogous to cl_khr_fp64. I don't know which targets support this datatype > (presumably all, since llvm supports them?), so I enabled this for all > targets -- this will break things if this is wrong. Just curious... How does LLVM/Clang support the half by default nowadays? I've heard that for NVIDIA GPUs, for example, the half is supported only as a storage format. That is, you have the float in 16bit format in memory but whenever you compute something with halfs, they are converted to single precision floats to avoid the need for separate floating point units for halfs. Just curious to hear what happens when you use half floats in LLVM/Clang now -- do they convert them to single precision fp whenever computation occurs? The last time I checked, 'half' was not a datatype in the LLVM IR thus they could not be selected (to be implemented with the target ISA) nicely. It seems there are only two intrinsics for halfs available: http://llvm.org/docs/LangRef.html#int_fp16 Does Clang generate those automatically for halfs in OpenCL C now? For example if you perform a basic operation halfA + halfB, what happens? I'm interested in a proper half support as for embedded/mobile it is more beneficial than just for saving the memory bandwidth as you can save in the area of the FPU, improve the speed, lower the energy consumption, etc. if you can do with half floats for your computations. But I think they do not accept it as a proper datatype in LLVM before there is a real (read: off-the-shelf) target in LLVM that supports it natively. -- --Pekka |
From: Pekka J. <pek...@tu...> - 2011-12-15 19:17:54
|
On 12/15/2011 09:04 PM, Erik Schnetter wrote: > supports the half datatype How do you think so? http://llvm.org/docs/LangRef.html#t_floating As I wrote, I think it's supported only via those two conversion intrinsics: http://llvm.org/docs/LangRef.html#int_fp16 My question was that who implements those intrinsics (fp32 to fp16 to fp32) as they require some bit manipulation of the fp fields, AFAIK (extract mantissa, exponent, sign and put them pack to the destination) and it's unlikely the hardware has direct instructions for such conversion. Do you mean that the default lowering of those intrinsics produces the conversion code too? Using native halfs would mean that one can use smaller adders, multipliers, shifters etc. in the FPUs which means energy savings in low power designs (less switching activity). Too bad it seems not to be supported yet in LLVM, AFAIU. -- --Pekka |
From: Erik S. <esc...@pe...> - 2011-12-15 19:32:09
|
The conversion intrinsics exist in LLVM, and are implemented in some of its backends. To my knowledge, currently only the ARM backend supports it, presumably via a machine instruction (or maybe via a sequence of machine instructions). Other backends will report an error when the llvm code is lowered to machine code -- that is (as I had to find out), libkernel.a will build fine on all architectures, but the respective functions cannot be used. As you say, it should not be difficult to implement this generically for all other platforms, either in pocl, or (better) in LLVM. This may be slow, but the memory savings (in particular also if this is stored in a file) may make the slow conversion worthwhile for some applications. -erik 2011/12/15 Pekka Jääskeläinen <pek...@tu...> > On 12/15/2011 09:04 PM, Erik Schnetter wrote: > >> supports the half datatype >> > > How do you think so? > http://llvm.org/docs/LangRef.**html#t_floating<http://llvm.org/docs/LangRef.html#t_floating> > > As I wrote, I think it's supported only via those two conversion > intrinsics: > > http://llvm.org/docs/LangRef.**html#int_fp16<http://llvm.org/docs/LangRef.html#int_fp16> > > My question was that who implements those intrinsics > (fp32 to fp16 to fp32) as they require some bit manipulation of > the fp fields, AFAIK (extract mantissa, exponent, sign and put > them pack to the destination) and it's unlikely the hardware has > direct instructions for such conversion. Do you mean that > the default lowering of those intrinsics produces the conversion > code too? > > Using native halfs would mean that one can use smaller adders, > multipliers, shifters etc. in the FPUs which means energy savings in > low power designs (less switching activity). Too bad it seems not to be > supported yet in LLVM, AFAIU. > > -- > --Pekka > > -- Erik Schnetter <esc...@pe...> http://www.cct.lsu.edu/~eschnett/ AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... |
From: Carlos S. de La L. <car...@ur...> - 2011-12-16 11:59:57
|
Just for clarification: There is no fp16 type in LLVM, at all, neither for computation nor storage. It is not defined in LLVM assembly language. What clang does is generates i16 (integer) values for halfs, and converts i16 to floats before operating with them. The LLVM intrinsics convert i16 <-> float, there is no fp16 type. I would expect therefore those intrinsics to work on all the LLVM codegen targets (int to float works, so this should also). >From the pocl kernel library I think the current way is correct, if halfs are not mandatory in OpenCL, check for the "half" support in the compiler and activate the extension if it is there. Support meaning not only that the compiler "eats" the keyword but also its size as the standard defines it. So, as it is done. To map halfs to real half-operating-hardware LLVM-side changes would be needed, to add the half as a real type, as right now it is not possible (without a lot of def-use chain analysis) to determine whether a float comes from a "half/i16" or is a real float. BR Carlos On Thu, 2011-12-15 at 14:32 -0500, Erik Schnetter wrote: > The conversion intrinsics exist in LLVM, and are implemented in some > of its backends. To my knowledge, currently only the ARM backend > supports it, presumably via a machine instruction (or maybe via a > sequence of machine instructions). Other backends will report an error > when the llvm code is lowered to machine code -- that is (as I had to > find out), libkernel.a will build fine on all architectures, but the > respective functions cannot be used. > > > As you say, it should not be difficult to implement this generically > for all other platforms, either in pocl, or (better) in LLVM. This may > be slow, but the memory savings (in particular also if this is stored > in a file) may make the slow conversion worthwhile for some > applications. > > > -erik > > 2011/12/15 Pekka Jääskeläinen <pek...@tu...> > On 12/15/2011 09:04 PM, Erik Schnetter wrote: > supports the half datatype > > How do you think so? > http://llvm.org/docs/LangRef.html#t_floating > > As I wrote, I think it's supported only via those two > conversion > intrinsics: > > http://llvm.org/docs/LangRef.html#int_fp16 > > My question was that who implements those intrinsics > (fp32 to fp16 to fp32) as they require some bit manipulation > of > the fp fields, AFAIK (extract mantissa, exponent, sign and put > them pack to the destination) and it's unlikely the hardware > has > direct instructions for such conversion. Do you mean that > the default lowering of those intrinsics produces the > conversion > code too? > > Using native halfs would mean that one can use smaller adders, > multipliers, shifters etc. in the FPUs which means energy > savings in > low power designs (less switching activity). Too bad it seems > not to be > supported yet in LLVM, AFAIU. > > -- > --Pekka > > > > > > -- > Erik Schnetter <esc...@pe...> > http://www.cct.lsu.edu/~eschnett/ > AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... > ------------------------------------------------------------------------------ > 10 Tips for Better Server Consolidation > Server virtualization is being driven by many needs. > But none more important than the need to reduce IT complexity > while improving strategic productivity. Learn More! > http://www.accelacomm.com/jaw/sdnl/114/51507609/ > _______________________________________________ Pocl-devel mailing list Poc...@li... https://lists.sourceforge.net/lists/listinfo/pocl-devel |
From: Erik S. <esc...@pe...> - 2011-12-16 15:18:57
|
On Fri, Dec 16, 2011 at 7:03 AM, Carlos Sánchez de La Lama < car...@ur...> wrote: > Just for clarification: > > There is no fp16 type in LLVM, at all, neither for computation nor > storage. It is not defined in LLVM assembly language. > > What clang does is generates i16 (integer) values for halfs, and > converts i16 to floats before operating with them. The LLVM intrinsics > convert i16 <-> float, there is no fp16 type. > I would expect therefore those intrinsics to work on all the LLVM > codegen targets (int to float works, so this should also). > The intrinsics do not work -- tried 3.0 and trunk. By looking at the code, I believe that this conversion intrinsic is only defined for ARM. The i16 contains a bit pattern representing the fp16 value, it cannot be interpreted as integer value. It seems to me that using i16 is purely a hack to avoid introducing a new (and very limited) LLVM datatype, because by using an i16 one ensures that load/store etc. work correctly. -erik -- Erik Schnetter <esc...@pe...> http://www.cct.lsu.edu/~eschnett/ AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... |
From: Carlos S. de La L. <car...@ur...> - 2011-12-16 15:43:09
|
Yep, the item is stored in fp16 format inside the i16 of course... I thought it would work as you can have a target independent f16 to f32 conversion but that requires assuming the storage format is IEEE FP. Anyways, pocl-wide it is enough as it is now IMHO, if LLVM supports codegen for halfs in a target then the kernel library uses it, otherwise it does not. Carlos On Fri, 2011-12-16 at 10:18 -0500, Erik Schnetter wrote: > On Fri, Dec 16, 2011 at 7:03 AM, Carlos Sánchez de La Lama > <car...@ur...> wrote: > Just for clarification: > > There is no fp16 type in LLVM, at all, neither for computation > nor > storage. It is not defined in LLVM assembly language. > > What clang does is generates i16 (integer) values for halfs, > and > converts i16 to floats before operating with them. The LLVM > intrinsics > convert i16 <-> float, there is no fp16 type. > I would expect therefore those intrinsics to work on all the > LLVM > codegen targets (int to float works, so this should also). > > > The intrinsics do not work -- tried 3.0 and trunk. By looking at the > code, I believe that this conversion intrinsic is only defined for > ARM. > > > The i16 contains a bit pattern representing the fp16 value, it cannot > be interpreted as integer value. It seems to me that using i16 is > purely a hack to avoid introducing a new (and very limited) LLVM > datatype, because by using an i16 one ensures that load/store etc. > work correctly. > > > -erik > > > -- > Erik Schnetter <esc...@pe...> > http://www.cct.lsu.edu/~eschnett/ > AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... |
From: Pekka J. <pek...@tu...> - 2011-12-20 08:41:32
|
For the record, LLVM now has half support in the IR too: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20111219/133712.html On 12/16/2011 05:46 PM, Carlos Sánchez de La Lama wrote: > Yep, the item is stored in fp16 format inside the i16 of course... I > thought it would work as you can have a target independent f16 to f32 > conversion but that requires assuming the storage format is IEEE FP. > > Anyways, pocl-wide it is enough as it is now IMHO, if LLVM supports > codegen for halfs in a target then the kernel library uses it, otherwise > it does not. > > Carlos > > On Fri, 2011-12-16 at 10:18 -0500, Erik Schnetter wrote: >> On Fri, Dec 16, 2011 at 7:03 AM, Carlos Sánchez de La Lama >> <car...@ur...> wrote: >> Just for clarification: >> >> There is no fp16 type in LLVM, at all, neither for computation >> nor >> storage. It is not defined in LLVM assembly language. >> >> What clang does is generates i16 (integer) values for halfs, >> and >> converts i16 to floats before operating with them. The LLVM >> intrinsics >> convert i16<-> float, there is no fp16 type. >> I would expect therefore those intrinsics to work on all the >> LLVM >> codegen targets (int to float works, so this should also). >> >> >> The intrinsics do not work -- tried 3.0 and trunk. By looking at the >> code, I believe that this conversion intrinsic is only defined for >> ARM. >> >> >> The i16 contains a bit pattern representing the fp16 value, it cannot >> be interpreted as integer value. It seems to me that using i16 is >> purely a hack to avoid introducing a new (and very limited) LLVM >> datatype, because by using an i16 one ensures that load/store etc. >> work correctly. >> >> >> -erik >> >> >> -- >> Erik Schnetter<esc...@pe...> >> http://www.cct.lsu.edu/~eschnett/ >> AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... > > -- --PJ |
From: Erik S. <esc...@pe...> - 2011-12-15 16:24:51
|
OpenCL supports only two operations for halfs: vload_half, converting it to a float, and vstore_half, converting from a float. Nothing else exists explicitly, not even vectors of halfs. Essentially the only thing one can do with the half type is to pass a half* to these load/store routines. There are routines such as float sin_half(float) that are only required to have the precision offered by datatype half (allowing optimisations), but the API is via float. There is text in the standard presumable allowing this to be optimised to use operations that act directly on half values, but this is not required. I added code to detect whether clang supports half (called __fp16 in C), and if so, these vload_half/vload_store routines are available. sin_half and friends are always available, forwarding to their float counterparts by default -- I assume that target-specific optimisations can do better. -erik 2011/12/15 Pekka Jääskeläinen <pek...@tu...> > On 12/15/2011 01:09 AM, Erik Schnetter wrote: > > Erik Schnetter has proposed merging lp:~schnetter/pocl/main into lp:pocl. > > > > Requested reviews: pocl maintaners (pocl) > > > > For more details, see: > > https://code.launchpad.net/~schnetter/pocl/main/+merge/85761 > > > > I added support for the half datatype, protected by #ifdef cl_khr_fp16, > > analogous to cl_khr_fp64. I don't know which targets support this > datatype > > (presumably all, since llvm supports them?), so I enabled this for all > > targets -- this will break things if this is wrong. > > Just curious... > > How does LLVM/Clang support the half by default nowadays? I've heard that > for > NVIDIA GPUs, for example, the half is supported only as a storage format. > That > is, you have the float in 16bit format in memory but whenever you compute > something with halfs, they are converted to single precision floats to > avoid > the need for separate floating point units for halfs. > > Just curious to hear what happens when you use half floats in LLVM/Clang > now -- do they convert them to single precision fp whenever computation > occurs? > The last time I checked, 'half' was not a datatype in the LLVM IR > thus they could not be selected (to be implemented with the target ISA) > nicely. > > It seems there are only two intrinsics for halfs available: > http://llvm.org/docs/LangRef.html#int_fp16 > > Does Clang generate those automatically for halfs in OpenCL C now? For > example > if you perform a basic operation halfA + halfB, what happens? > > I'm interested in a proper half support as for embedded/mobile it is more > beneficial than just for saving the memory bandwidth as you can save in > the area > of the FPU, improve the speed, lower the energy consumption, etc. if you > can do with half floats for your computations. But I think they do not > accept > it as a proper datatype in LLVM before there is a real (read: > off-the-shelf) > target in LLVM that supports it natively. > > -- > --Pekka > > > > ------------------------------------------------------------------------------ > 10 Tips for Better Server Consolidation > Server virtualization is being driven by many needs. > But none more important than the need to reduce IT complexity > while improving strategic productivity. Learn More! > http://www.accelacomm.com/jaw/sdnl/114/51507609/ > _______________________________________________ > Pocl-devel mailing list > Poc...@li... > https://lists.sourceforge.net/lists/listinfo/pocl-devel > -- Erik Schnetter <esc...@pe...> http://www.cct.lsu.edu/~eschnett/ AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... |
From: Pekka J. <pek...@tu...> - 2011-12-15 16:44:55
|
On 12/15/2011 06:24 PM, Erik Schnetter wrote: > OpenCL supports only two operations for halfs: vload_half, converting it > to a float, and vstore_half, converting from a float. Nothing else > exists explicitly, not even vectors of halfs. Essentially the only thing > one can do with the half type is to pass a half* to these load/store > routines. OK, interesting. Who then generates the float <-> half conversion code for the LLVM intrinsics? Does LLVM generate it automatically or do we need to provide conversion routines in pocl? -- Pekka |
From: Erik S. <esc...@pe...> - 2011-12-15 19:05:02
|
Since LLVM supports the half datatype (I believe it was added about nine months ago, especially for OpenCL), it generates these conversions itself. I just updated (read: corrected) the autoconf rules to determine whether half is supported. It seems that it is currently only supported on ARM. Other platforms will have this code disabled. I will push this to my branch soon. To my knowledge, the "traditional" implementation of half would be to perform all arithmetic operations in float precision, except possibly expensive iterative operations (divide, sqrt), where fewer iterations may be used that for float. -erik 2011/12/15 Pekka Jääskeläinen <pek...@tu...> > On 12/15/2011 06:24 PM, Erik Schnetter wrote: > > OpenCL supports only two operations for halfs: vload_half, converting it > > to a float, and vstore_half, converting from a float. Nothing else > > exists explicitly, not even vectors of halfs. Essentially the only thing > > one can do with the half type is to pass a half* to these load/store > > routines. > > OK, interesting. Who then generates the float <-> half conversion code > for the LLVM intrinsics? Does LLVM generate it automatically or > do we need to provide conversion routines in pocl? > > > -- > Pekka > -- Erik Schnetter <esc...@pe...> http://www.cct.lsu.edu/~eschnett/ AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... |