Thread: [Pocl-devel] Get rid of libm dependency

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi,

Please check:
https://blueprints.launchpad.net/pocl/+spec/no-libm-dep

Comments, thoughts?
-- 
--Pekka

I put some  thought on the blueprint whiteboard, in launchpad.

Carlos

El 09/11/2011, a las 09:54, Pekka Jääskeläinen escribió:

> Hi,
> 
> Please check:
> https://blueprints.launchpad.net/pocl/+spec/no-libm-dep
> 
> Comments, thoughts?
> -- 
> --Pekka
> 
> 
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Pocl-devel mailing list
> Poc...@li...
> https://lists.sourceforge.net/lists/listinfo/pocl-devel

On 11/09/2011 01:38 PM, Carlos Sánchez de La Lama wrote:
> I put some  thought on the blueprint whiteboard, in launchpad.

Thanks. I added some more. The Blueprint system of Launchapd doesn't
seem to be designed for discussions so it's better to discuss it
further here.

Unless you agree with the current proposal and I can start implementing it,
of course! :)

-- 
--Pekka

Ok, i inline your "post" in the blueprint here:

> Requiring Newlib to be in device/host produces more trouble. A point in embedding the required functions inside pocl is to make pocl self-contained (aside from the LLVM/Clang dependency), that is, to make it easily portable to various platforms (hosts and devices) + the inlining benefits. I just got rid of the gcc dependency in the 'ld' branch and I'd like to get rid of the libm dependency (were it the Newlib or the native) too.

There is no need to include newlib on the host. I was thinking already on this wen I proposed it. Newlib would be needed only at pocl compile time, then (some of) the kernel libraries link against them, in bytecode, producing self-contained kernel libraries with no external requirements.

> Newlib is quite big and contains the whole C library which pocl does not need. It would require porting the whole newlib to the target under question whenever one wants to use pocl on a host/device. I see that quite a bit more "overkill" than just copying the functions we need from some BSD/MIT licensed library, if one is found.

Not really. newlib works using stubs so you need to port nothing, some of the functions will have unresolved stubs but we wont be using those. Only the needed functions (say sin/cos/whatever) will get linked (library behaviour). Basically it is the same as "borrowing" the sources, but at bytecode level.

> Anyways, it seems the math lib of Newlib is nicely separable so we could include it in pocl to avoid the requirement of Newlib installed? The license is a bit unclear but I think it's a BSD license for the libm part. We could just copy the 'math' dir from the Newlib to the source tree and then the various kernel lib implementations can cherry pick the codes they need from there (at source code level due to the different bitcode targets).

I do not like the idea of taking code of of there, for several reasons:
1) It is kind of "wrong", even when licenses allow it. If someone used pocl for example I would prefer them to use pocl than getting the passes out of llvmopencl directory and using it in their own project. If you take part of a project code base you do not support the project.
2) It requires "mimicing" some of the configure / makefile structure of newlib, configure switches, etc. Why reinventing the wheel? They give some source files with the build framework, let us use it.

The reasons would of course be seen as "weak" if there was a "strong" reason against them, but the linking approach is a cleaner alternative IMO.

> A major point IMHO for the "inlineable versions by default" is the exploitation inter-WI parallelism with vectors or long instructions which is ruined if you have a libary call in the kernel. Avoiding such can lead to a more parallelizable default generic lib (ability to maybe execute some parts of sin/cos, for example, for multiple WIs using parallel instructions) which should be a good in the "performance portability" sense.

Of course, that is pretty clear. The point is if we do not include newlib in the pocl (which I am against) then making the default library depend on it might be a drawback, given that the builtins are also "standard". However I am not so strongly about it, newlib (or any other C-lib) build-time dependency is not a big deal.

> So. I propose:
> 1. Copy the required math implementations from Newlib
> 2 .Use them in the generic implementation and assume the device-optimized libs use whatever is better for them

I would say:
1. Compile newlib to bytecode (I guess CC=clang CFLAGS=-ccc-target-triplet=xxx is probably enough)
2. Make either default or per-device libs link against that and perform a library linking step to parts of the C-library being used in the kernel library get linked in creating a self contained kernel runtime library.

Carlos

On 11/09/2011 02:43 PM, Carlos Sánchez de La Lama wrote:
> The reasons would of course be seen as "weak" if there was a "strong"
> reason against them, but the linking approach is a cleaner alternative
> IMO.

Fine. One drawback of this is that the whole Newlib bitcode lib then
needs to be built for all the targets that need it. Even though a particular
kernel lib would require, say, two math functions from it.

I'll take a look if the math library can be configured separately
in Newlib which would reduce the harm from this. At least it seems
to have its own configure script.

> Of course, that is pretty clear. The point is if we do not include newlib
> in the pocl (which I am against) then making the default library depend on
> it might be a drawback, given that the builtins are also "standard".
> However I am not so strongly about it, newlib (or any other C-lib)
> build-time dependency is not a big deal.

Let's think of the current targets in the pocl tree.

x86_64 (assuming a multicore) without WI replication is likely to
benefit from host's CPU-optimized math lib due to having smaller
instruction cache foot print (and I think CALL and LOOP are quite
well optimized in this regard in x86_64 microarchs). Therefore, it
should generally use the current builtin approach, not the inlined
Newlib funcs, and lower to the syslib calls instead.

ARM could use NEON (or other instruction set extension/co-processor)
optimized native math libs, if available. However, then the ABI
of the lib must match with whatever the clang generates. That is,
most likely the very-target specific switches (that use the same
ISEs as the device's libs) must be used. I'm unsure if this is
a problem or not.

TCE is fully customizable. It could easily have hardware sin/cos,
for example. Thus, it should use the intrinsics/builtins which are
lowered to the best possible instructions using the ADF info. And
as a fallback it should use the Newlib libm included in the TCE
tree. On the other hand, the kernel libs should be built against
the ADF info so it can inline as much as possible to exploit the
static ILP.

> I would say:
 > 1. Compile newlib to bytecode (I guess CC=clang
> CFLAGS=-ccc-target-triplet=xxx is probably enough)

We do compile Newlib to a bitcode lib in TCE, I can check from
its build files what is required.

 > 2. Make either default or per-device libs link against that and
 > perform a library linking step to
 > parts of the C-library being used in the kernel library get
 > linked in creating a self contained kernel runtime library.

I think 2 boils down to the question of whether the inter-WI parallelism
will be the main source of DLP in pocl or not. As we have discussed, it
depends on e.g. icache configuration and possibly on the compiled kernel
whether the replication is beneficial or only intra-WI DLP should be used.

For now as there is no inter-WI vectorization support in pocl yet, I
suppose the libcall-based implementation should be the default. This fact
suggests that it's too early to trouble oneself with integrating the
Newlib (or other math lib) to the build system too.

-- 
Pekka

On Wed, Nov 9, 2011 at 7:43 AM, Carlos Sánchez de La Lama
<car...@ur...> wrote:
>> Anyways, it seems the math lib of Newlib is nicely separable so we could include it in pocl to avoid the requirement of Newlib installed? The license is a bit unclear but I think it's a BSD license for the libm part. We could just copy the 'math' dir from the Newlib to the source tree and then the various kernel lib implementations can cherry pick the codes they need from there (at source code level due to the different bitcode targets).

The math lib is not the only thing that could be useful. For example,
printf is a very useful OpenCL extension that should be supported. The
underlying I/O stream representation probably needs to be implemented
from scratch, but the formatting code should work fine.

-erik

-- 
Erik Schnetter <esc...@pe...>
http://www.cct.lsu.edu/~eschnett/
AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm...

On 11/09/2011 03:40 PM, Erik Schnetter wrote:
> The math lib is not the only thing that could be useful. For example,
> printf is a very useful OpenCL extension that should be supported. The
> underlying I/O stream representation probably needs to be implemented from
> scratch, but the formatting code should work fine.

I'm not sure of printf().

Maybe that should use the stdio.h and -lc of the device because

1) The actual stdout stream destination is fully platform (OS+device)
dependent.
2) The "inlining benefits" do not apply to it as it's probably
only used for debug printouts. Inter-WI DLP does not matter here.
3) It's an optional extension. In case the target does not support it,
the target just doesn't advertise it as a vendor extension.

-- 
Pekka