Thread: Re: [Pocl-devel] [Merge] lp:~schnetter/pocl/main into lp:pocl

Brought to you by: csanchezdll, jakkep, kraiskil, mogurakun, schnetter

pocl-devel

Re: [Pocl-devel] [Merge] lp:~schnetter/pocl/main into lp:pocl

From: Pekka J. <pek...@tu...> - 2011-12-15 08:06:13

On 12/15/2011 01:09 AM, Erik Schnetter wrote:
> Erik Schnetter has proposed merging lp:~schnetter/pocl/main into lp:pocl.
>
> Requested reviews: pocl maintaners (pocl)
>
> For more details, see:
> https://code.launchpad.net/~schnetter/pocl/main/+merge/85761
>
> I added support for the half datatype, protected by #ifdef cl_khr_fp16,
> analogous to cl_khr_fp64. I don't know which targets support this datatype
> (presumably all, since llvm supports them?), so I enabled this for all
> targets -- this will break things if this is wrong.

Just curious...

How does LLVM/Clang support the half by default nowadays? I've heard that for
NVIDIA GPUs, for example, the half is supported only as a storage format. That
is, you have the float in 16bit format in memory but whenever you compute
something with halfs, they are converted to single precision floats to avoid
the need for separate floating point units for halfs.

Just curious to hear what happens when you use half floats in LLVM/Clang
now -- do they convert them to single precision fp whenever computation occurs?
The last time I checked, 'half' was not a datatype in the LLVM IR
thus they could not be selected (to be implemented with the target ISA) nicely.

It seems there are only two intrinsics for halfs available:
http://llvm.org/docs/LangRef.html#int_fp16

Does Clang generate those automatically for halfs in OpenCL C now? For example
if you perform a basic operation halfA + halfB, what happens?

I'm interested in a proper half support as for embedded/mobile it is more
beneficial than just for saving the memory bandwidth as you can save in the area
of the FPU, improve the speed, lower the energy consumption, etc. if you
can do with half floats for your computations. But I think they do not accept
it as a proper datatype in LLVM before there is a real (read: off-the-shelf)
target in LLVM that supports it natively.

-- 
--Pekka

Re: [Pocl-devel] [Merge] lp:~schnetter/pocl/main into lp:pocl

From: Pekka J. <pek...@tu...> - 2011-12-15 19:17:54

On 12/15/2011 09:04 PM, Erik Schnetter wrote:
> supports the half datatype

How do you think so?
http://llvm.org/docs/LangRef.html#t_floating

As I wrote, I think it's supported only via those two conversion
intrinsics:

http://llvm.org/docs/LangRef.html#int_fp16

My question was that who implements those intrinsics
(fp32 to fp16 to fp32) as they require some bit manipulation of
the fp fields, AFAIK (extract mantissa, exponent, sign and put
them pack to the destination) and it's unlikely the hardware has
direct instructions for such conversion. Do you mean that
the default lowering of those intrinsics produces the conversion
code too?

Using native halfs would mean that one can use smaller adders,
multipliers, shifters etc. in the FPUs which means energy savings in
low power designs (less switching activity). Too bad it seems not to be
supported yet in LLVM, AFAIU.

-- 
--Pekka

Re: [Pocl-devel] [Merge] lp:~schnetter/pocl/main into lp:pocl

From: Erik S. <esc...@pe...> - 2011-12-15 19:32:09

The conversion intrinsics exist in LLVM, and are implemented in some of its
backends. To my knowledge, currently only the ARM backend supports it,
presumably via a machine instruction (or maybe via a sequence of machine
instructions). Other backends will report an error when the llvm code is
lowered to machine code -- that is (as I had to find out), libkernel.a will
build fine on all architectures, but the respective functions cannot be
used.

As you say, it should not be difficult to implement this generically for
all other platforms, either in pocl, or (better) in LLVM. This may be slow,
but the memory savings (in particular also if this is stored in a file) may
make the slow conversion worthwhile for some applications.

-erik

2011/12/15 Pekka Jääskeläinen <pek...@tu...>

> On 12/15/2011 09:04 PM, Erik Schnetter wrote:
>
>> supports the half datatype
>>
>
> How do you think so?
> http://llvm.org/docs/LangRef.**html#t_floating<http://llvm.org/docs/LangRef.html#t_floating>
>
> As I wrote, I think it's supported only via those two conversion
> intrinsics:
>
> http://llvm.org/docs/LangRef.**html#int_fp16<http://llvm.org/docs/LangRef.html#int_fp16>
>
> My question was that who implements those intrinsics
> (fp32 to fp16 to fp32) as they require some bit manipulation of
> the fp fields, AFAIK (extract mantissa, exponent, sign and put
> them pack to the destination) and it's unlikely the hardware has
> direct instructions for such conversion. Do you mean that
> the default lowering of those intrinsics produces the conversion
> code too?
>
> Using native halfs would mean that one can use smaller adders,
> multipliers, shifters etc. in the FPUs which means energy savings in
> low power designs (less switching activity). Too bad it seems not to be
> supported yet in LLVM, AFAIU.
>
> --
> --Pekka
>
>

-- 
Erik Schnetter <esc...@pe...>
http://www.cct.lsu.edu/~eschnett/
AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm...

Re: [Pocl-devel] [Merge] lp:~schnetter/pocl/main into lp:pocl

From: Carlos S. de La L. <car...@ur...> - 2011-12-16 11:59:57

Just for clarification:

There is no fp16 type in LLVM, at all, neither for computation nor
storage. It is not defined in LLVM assembly language.

What clang does is generates i16 (integer) values for halfs, and
converts i16 to floats before operating with them. The LLVM intrinsics
convert i16 <-> float, there is no fp16 type.
I would expect therefore those intrinsics to work on all the LLVM
codegen targets (int to float works, so this should also).

>From the pocl kernel library I think the current way is correct, if
halfs are not mandatory in OpenCL, check for the "half" support in the
compiler and activate the extension if it is there. Support meaning not
only that the compiler "eats" the keyword but also its size as the
standard defines it. So, as it is done.

To map halfs to real half-operating-hardware LLVM-side changes would be
needed, to add the half as a real type, as right now it is not possible
(without a lot of def-use chain analysis) to determine whether a float
comes from a "half/i16" or is a real float.

BR

Carlos

On Thu, 2011-12-15 at 14:32 -0500, Erik Schnetter wrote:
> The conversion intrinsics exist in LLVM, and are implemented in some
> of its backends. To my knowledge, currently only the ARM backend
> supports it, presumably via a machine instruction (or maybe via a
> sequence of machine instructions). Other backends will report an error
> when the llvm code is lowered to machine code -- that is (as I had to
> find out), libkernel.a will build fine on all architectures, but the
> respective functions cannot be used.
> 
> 
> As you say, it should not be difficult to implement this generically
> for all other platforms, either in pocl, or (better) in LLVM. This may
> be slow, but the memory savings (in particular also if this is stored
> in a file) may make the slow conversion worthwhile for some
> applications.
> 
> 
> -erik
> 
> 2011/12/15 Pekka Jääskeläinen <pek...@tu...>
>         On 12/15/2011 09:04 PM, Erik Schnetter wrote:
>                 supports the half datatype
>         
>         How do you think so?
>         http://llvm.org/docs/LangRef.html#t_floating
>         
>         As I wrote, I think it's supported only via those two
>         conversion
>         intrinsics:
>         
>         http://llvm.org/docs/LangRef.html#int_fp16
>         
>         My question was that who implements those intrinsics
>         (fp32 to fp16 to fp32) as they require some bit manipulation
>         of
>         the fp fields, AFAIK (extract mantissa, exponent, sign and put
>         them pack to the destination) and it's unlikely the hardware
>         has
>         direct instructions for such conversion. Do you mean that
>         the default lowering of those intrinsics produces the
>         conversion
>         code too?
>         
>         Using native halfs would mean that one can use smaller adders,
>         multipliers, shifters etc. in the FPUs which means energy
>         savings in
>         low power designs (less switching activity). Too bad it seems
>         not to be
>         supported yet in LLVM, AFAIU.
>         
>         -- 
>         --Pekka
>         
> 
> 
> 
> 
> -- 
> Erik Schnetter <esc...@pe...>
> http://www.cct.lsu.edu/~eschnett/
> AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm...
> ------------------------------------------------------------------------------
> 10 Tips for Better Server Consolidation
> Server virtualization is being driven by many needs.  
> But none more important than the need to reduce IT complexity 
> while improving strategic productivity.  Learn More! 
> http://www.accelacomm.com/jaw/sdnl/114/51507609/
> _______________________________________________ Pocl-devel mailing list Poc...@li... https://lists.sourceforge.net/lists/listinfo/pocl-devel

Re: [Pocl-devel] [Merge] lp:~schnetter/pocl/main into lp:pocl

From: Erik S. <esc...@pe...> - 2011-12-16 15:18:57

On Fri, Dec 16, 2011 at 7:03 AM, Carlos Sánchez de La Lama <
car...@ur...> wrote:

> Just for clarification:
>
> There is no fp16 type in LLVM, at all, neither for computation nor
> storage. It is not defined in LLVM assembly language.
>
> What clang does is generates i16 (integer) values for halfs, and
> converts i16 to floats before operating with them. The LLVM intrinsics
> convert i16 <-> float, there is no fp16 type.
> I would expect therefore those intrinsics to work on all the LLVM
> codegen targets (int to float works, so this should also).
>

The intrinsics do not work -- tried 3.0 and trunk. By looking at the code,
I believe that this conversion intrinsic is only defined for ARM.

The i16 contains a bit pattern representing the fp16 value, it cannot be
interpreted as integer value. It seems to me that using i16 is purely a
hack to avoid introducing a new (and very limited) LLVM datatype, because
by using an i16 one ensures that load/store etc. work correctly.

-erik

-- 
Erik Schnetter <esc...@pe...>
http://www.cct.lsu.edu/~eschnett/
AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm...

Re: [Pocl-devel] [Merge] lp:~schnetter/pocl/main into lp:pocl

From: Carlos S. de La L. <car...@ur...> - 2011-12-16 15:43:09

Yep, the item is stored in fp16 format inside the i16 of course... I
thought it would work as you can have a target independent f16 to f32
conversion but that requires assuming the storage format is IEEE FP.

Anyways, pocl-wide it is enough as it is now IMHO, if LLVM supports
codegen for halfs in a target then the kernel library uses it, otherwise
it does not.

Carlos

On Fri, 2011-12-16 at 10:18 -0500, Erik Schnetter wrote:
> On Fri, Dec 16, 2011 at 7:03 AM, Carlos Sánchez de La Lama
> <car...@ur...> wrote:
>         Just for clarification:
>         
>         There is no fp16 type in LLVM, at all, neither for computation
>         nor
>         storage. It is not defined in LLVM assembly language.
>         
>         What clang does is generates i16 (integer) values for halfs,
>         and
>         converts i16 to floats before operating with them. The LLVM
>         intrinsics
>         convert i16 <-> float, there is no fp16 type.
>         I would expect therefore those intrinsics to work on all the
>         LLVM
>         codegen targets (int to float works, so this should also).
> 
> 
> The intrinsics do not work -- tried 3.0 and trunk. By looking at the
> code, I believe that this conversion intrinsic is only defined for
> ARM.
> 
> 
> The i16 contains a bit pattern representing the fp16 value, it cannot
> be interpreted as integer value. It seems to me that using i16 is
> purely a hack to avoid introducing a new (and very limited) LLVM
> datatype, because by using an i16 one ensures that load/store etc.
> work correctly.
> 
> 
> -erik
> 
> 
> -- 
> Erik Schnetter <esc...@pe...>
> http://www.cct.lsu.edu/~eschnett/
> AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm...

Re: [Pocl-devel] [Merge] lp:~schnetter/pocl/main into lp:pocl

From: Pekka J. <pek...@tu...> - 2011-12-20 08:41:32

For the record, LLVM now has half support in the IR too:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20111219/133712.html

On 12/16/2011 05:46 PM, Carlos Sánchez de La Lama wrote:
> Yep, the item is stored in fp16 format inside the i16 of course... I
> thought it would work as you can have a target independent f16 to f32
> conversion but that requires assuming the storage format is IEEE FP.
>
> Anyways, pocl-wide it is enough as it is now IMHO, if LLVM supports
> codegen for halfs in a target then the kernel library uses it, otherwise
> it does not.
>
> Carlos
>
> On Fri, 2011-12-16 at 10:18 -0500, Erik Schnetter wrote:
>> On Fri, Dec 16, 2011 at 7:03 AM, Carlos Sánchez de La Lama
>> <car...@ur...>  wrote:
>>          Just for clarification:
>>
>>          There is no fp16 type in LLVM, at all, neither for computation
>>          nor
>>          storage. It is not defined in LLVM assembly language.
>>
>>          What clang does is generates i16 (integer) values for halfs,
>>          and
>>          converts i16 to floats before operating with them. The LLVM
>>          intrinsics
>>          convert i16<->  float, there is no fp16 type.
>>          I would expect therefore those intrinsics to work on all the
>>          LLVM
>>          codegen targets (int to float works, so this should also).
>>
>>
>> The intrinsics do not work -- tried 3.0 and trunk. By looking at the
>> code, I believe that this conversion intrinsic is only defined for
>> ARM.
>>
>>
>> The i16 contains a bit pattern representing the fp16 value, it cannot
>> be interpreted as integer value. It seems to me that using i16 is
>> purely a hack to avoid introducing a new (and very limited) LLVM
>> datatype, because by using an i16 one ensures that load/store etc.
>> work correctly.
>>
>>
>> -erik
>>
>>
>> --
>> Erik Schnetter<esc...@pe...>
>> http://www.cct.lsu.edu/~eschnett/
>> AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm...
>
>


-- 
--PJ

Re: [Pocl-devel] [Merge] lp:~schnetter/pocl/main into lp:pocl

From: Erik S. <esc...@pe...> - 2011-12-15 16:24:51

OpenCL supports only two operations for halfs: vload_half, converting it to
a float, and vstore_half, converting from a float. Nothing else exists
explicitly, not even vectors of halfs. Essentially the only thing one can
do with the half type is to pass a half* to these load/store routines.

There are routines such as float sin_half(float) that are only required to
have the precision offered by datatype half (allowing optimisations), but
the API is via float. There is text in the standard presumable allowing
this to be optimised to use operations that act directly on half values,
but this is not required.

I added code to detect whether clang supports half (called __fp16 in C),
and if so, these vload_half/vload_store routines are available. sin_half
and friends are always available, forwarding to their float counterparts by
default -- I assume that target-specific optimisations can do better.

-erik

2011/12/15 Pekka Jääskeläinen <pek...@tu...>

> On 12/15/2011 01:09 AM, Erik Schnetter wrote:
> > Erik Schnetter has proposed merging lp:~schnetter/pocl/main into lp:pocl.
> >
> > Requested reviews: pocl maintaners (pocl)
> >
> > For more details, see:
> > https://code.launchpad.net/~schnetter/pocl/main/+merge/85761
> >
> > I added support for the half datatype, protected by #ifdef cl_khr_fp16,
> > analogous to cl_khr_fp64. I don't know which targets support this
> datatype
> > (presumably all, since llvm supports them?), so I enabled this for all
> > targets -- this will break things if this is wrong.
>
> Just curious...
>
> How does LLVM/Clang support the half by default nowadays? I've heard that
> for
> NVIDIA GPUs, for example, the half is supported only as a storage format.
> That
> is, you have the float in 16bit format in memory but whenever you compute
> something with halfs, they are converted to single precision floats to
> avoid
> the need for separate floating point units for halfs.
>
> Just curious to hear what happens when you use half floats in LLVM/Clang
> now -- do they convert them to single precision fp whenever computation
> occurs?
> The last time I checked, 'half' was not a datatype in the LLVM IR
> thus they could not be selected (to be implemented with the target ISA)
> nicely.
>
> It seems there are only two intrinsics for halfs available:
> http://llvm.org/docs/LangRef.html#int_fp16
>
> Does Clang generate those automatically for halfs in OpenCL C now? For
> example
> if you perform a basic operation halfA + halfB, what happens?
>
> I'm interested in a proper half support as for embedded/mobile it is more
> beneficial than just for saving the memory bandwidth as you can save in
> the area
> of the FPU, improve the speed, lower the energy consumption, etc. if you
> can do with half floats for your computations. But I think they do not
> accept
> it as a proper datatype in LLVM before there is a real (read:
> off-the-shelf)
> target in LLVM that supports it natively.
>
> --
> --Pekka
>
>
>
> ------------------------------------------------------------------------------
> 10 Tips for Better Server Consolidation
> Server virtualization is being driven by many needs.
> But none more important than the need to reduce IT complexity
> while improving strategic productivity.  Learn More!
> http://www.accelacomm.com/jaw/sdnl/114/51507609/
> _______________________________________________
> Pocl-devel mailing list
> Poc...@li...
> https://lists.sourceforge.net/lists/listinfo/pocl-devel
>



-- 
Erik Schnetter <esc...@pe...>
http://www.cct.lsu.edu/~eschnett/
AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm...

Re: [Pocl-devel] [Merge] lp:~schnetter/pocl/main into lp:pocl

From: Pekka J. <pek...@tu...> - 2011-12-15 16:44:55

On 12/15/2011 06:24 PM, Erik Schnetter wrote:
 > OpenCL supports only two operations for halfs: vload_half, converting it
 > to a float, and vstore_half, converting from a float. Nothing else
 > exists explicitly, not even vectors of halfs. Essentially the only thing
 > one can do with the half type is to pass a half* to these load/store
 > routines.

OK, interesting. Who then generates the float <-> half conversion code
for the LLVM intrinsics? Does LLVM generate it automatically or
do we need to provide conversion routines in pocl?

-- 
Pekka

Re: [Pocl-devel] [Merge] lp:~schnetter/pocl/main into lp:pocl

From: Erik S. <esc...@pe...> - 2011-12-15 19:05:02

Since LLVM supports the half datatype (I believe it was added about nine
months ago, especially for OpenCL), it generates these conversions itself.

I just updated (read: corrected) the autoconf rules to determine whether
half is supported. It seems that it is currently only supported on ARM.
Other platforms will have this code disabled. I will push this to my branch
soon.

To my knowledge, the "traditional" implementation of half would be to
perform all arithmetic operations in float precision, except possibly
expensive iterative operations (divide, sqrt), where fewer iterations may
be used that for float.

-erik

2011/12/15 Pekka Jääskeläinen <pek...@tu...>

> On 12/15/2011 06:24 PM, Erik Schnetter wrote:
> > OpenCL supports only two operations for halfs: vload_half, converting it
> > to a float, and vstore_half, converting from a float. Nothing else
> > exists explicitly, not even vectors of halfs. Essentially the only thing
> > one can do with the half type is to pass a half* to these load/store
> > routines.
>
> OK, interesting. Who then generates the float <-> half conversion code
> for the LLVM intrinsics? Does LLVM generate it automatically or
> do we need to provide conversion routines in pocl?
>
>
> --
> Pekka
>

-- 
Erik Schnetter <esc...@pe...>
http://www.cct.lsu.edu/~eschnett/
AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm...