pocl-devel Mailing List for pocl (Page 56)

pocl-devel — Portable OpenCL development discussion

You can subscribe to this list here.

2011	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (25)	Nov (11)	Dec (36)
2012	Jan (30)	Feb (4)	Mar (4)	Apr (7)	May (5)	Jun (31)	Jul (6)	Aug (19)	Sep (38)	Oct (30)	Nov (22)	Dec (19)
2013	Jan (55)	Feb (39)	Mar (77)	Apr (10)	May (83)	Jun (52)	Jul (86)	Aug (61)	Sep (29)	Oct (9)	Nov (38)	Dec (22)
2014	Jan (14)	Feb (29)	Mar (4)	Apr (19)	May (3)	Jun (27)	Jul (6)	Aug (5)	Sep (3)	Oct (48)	Nov	Dec (5)
2015	Jan (8)	Feb (2)	Mar (8)	Apr (16)	May	Jun	Jul (2)	Aug (1)	Sep (2)	Oct (13)	Nov (5)	Dec (2)
2016	Jan (26)	Feb (6)	Mar (8)	Apr (8)	May (2)	Jun	Jul	Aug (11)	Sep (3)	Oct (5)	Nov (14)	Dec (2)
2017	Jan (16)	Feb (4)	Mar (11)	Apr (4)	May (5)	Jun (5)	Jul (3)	Aug	Sep (6)	Oct	Nov (10)	Dec (6)
2018	Jan	Feb (21)	Mar (11)	Apr (3)	May (2)	Jun (8)	Jul	Aug (13)	Sep (6)	Oct (2)	Nov	Dec (11)
2019	Jan	Feb (5)	Mar (10)	Apr (2)	May	Jun	Jul	Aug	Sep (10)	Oct (4)	Nov	Dec
2020	Jan	Feb	Mar (1)	Apr (4)	May	Jun	Jul (3)	Aug	Sep (3)	Oct	Nov	Dec (4)
2021	Jan	Feb	Mar	Apr (1)	May (1)	Jun	Jul (4)	Aug	Sep	Oct (4)	Nov	Dec
2022	Jan	Feb	Mar (4)	Apr	May (11)	Jun (1)	Jul (3)	Aug	Sep (1)	Oct	Nov (2)	Dec (1)
2023	Jan (4)	Feb	Mar (1)	Apr	May	Jun (2)	Jul	Aug	Sep	Oct	Nov	Dec (1)

Flat | Threaded

<< < 1 .. 54 55 56 (Page 56 of 56)

Re: [Pocl-devel] Some run-time functions

From: Erik S. <esc...@pe...> - 2011-10-24 14:03:31

2011/10/24 Carlos Sánchez de La Lama <car...@ur...>:
> Hi Erik,
>
> On Mon, 2011-10-24 at 09:20 -0400, Erik Schnetter wrote:
>> 2011/10/24 Carlos Sánchez de La Lama <car...@ur...>:
>> > Hi Erik,
>> >
>> > thanks for your contributions. Some of the questions you ask are open for discussion, as the kernel library implementation is still in a really early stage of development.
>> >
>> >> - Is this implementation / coding style approximately acceptable?
>> >
>> > It is all right. We do not have an "official" coding guideline for pocl (yet) but basically I tried to follow GNU Coding Standards, except in the LLVM passes where the LLVM guidelines are used.
>> >
>> > I have seen you call the C library functions to implement the functionality. This is ok for "native" environments, but in a real device scenario, there is not going to be an underlying C library providing for example "cos" function. One possible option is to use some other embedded library (newlib?) underneath our kernel library, thought I have the feeling this might be overkill and make too big device binaries. Otherwise we would need to implement the "cos", "sin", whatever... functions completely in the kernel library.
>> >
>> > Thoughts?
>>
>> To get an efficient implementation, we will need to have something
>> hardware-dependent; a cross-platform implementation can only be a
>> fallback. It was my impression that POCL currently concentrates on
>> running on the host, where a libc is always available.
>
> POCL concentrates on static parallel architectures, really. Only the
> host "driver" comes in the source code but POCL itself started as the
> development of OpenCL support for TCE project (http://tce.cs.tut.fi). We
> generalized the passes to make it portable, and more widely useful.
>
>> For example, on Intel, one would want to use the fsin machine
>> instruction, and there are similar machine instructions (or sequences
>> thereof) for other architectures.
>>
>> For the fallback implementation, I would definitely use an existing
>> library. I don't know newlib, but it may be the way to go.
>
> Of course. Even newlib would not be target independent, it is compiled
> for a given architecture, and has its own set of processor dependent
> implementations of math functions (about their level of efficiently I am
> not sure). Agreed on the need for different library implementations.
>
>> >> - Since some of the code is highly repetitive, should we use a
>> >> templating mechanism, probably built onto m4?
>> >
>> > If it is *really* needed, then we can use it. But I am not sure how much effort it saves compared with the C preprocessor. If we want to make the system as portable as possible, it is desirable to keep dependencies to a minimum.
>> >
>> > Can provide a (simple) example of a case in the kernel library where M4 macros would help?
>>
>> It could provide a mechanism that is not per-file, but more generic.
>> For example, the implementations of sin, cos, tan, etc. are very
>> similar. In other words, OpenCL doesn't have #include (or does it when
>> the kernel library is built?), and using only #define leads to a lot
>> of duplication across source files.
>
> We have #include and #define and all of those during library building
> process. I know sin/cos and other math functions are quite repetitive,
> my doubt is whether the amount of "saved work" with m4 over plain C
> preprocessor justifies the dependency on m4.
>
> For example cos/sin/tan can be the same C file using a generic FUNC
> macro (lets say trig.inc) and:
>
> #define FUNC cos
> #include "tric.ing"
>
> #define FUNC sin
> #include "trig.inc"
>
> ...
>
> Again, if there is a clear gain using m4, no problem, but i am unsure of
> it really saving so much typing.

If there's #include, I prefer cpp as well.

-erik

-- 
Erik Schnetter <esc...@pe...>
http://www.cct.lsu.edu/~eschnett/
AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm...

[Pocl-devel] #pocl created

From: Pekka J. <pek...@tu...> - 2011-10-24 14:02:13

Hi,

I started a new IRC channel in the irc.oftc.net network called #pocl for
discussing pocl issues in real time.

See you there!
-- 
Pekka

Re: [Pocl-devel] Some run-time functions

From: Carlos S. de La L. <car...@ur...> - 2011-10-24 13:48:16

Hi Erik,

On Mon, 2011-10-24 at 09:20 -0400, Erik Schnetter wrote:
> 2011/10/24 Carlos Sánchez de La Lama <car...@ur...>:
> > Hi Erik,
> >
> > thanks for your contributions. Some of the questions you ask are open for discussion, as the kernel library implementation is still in a really early stage of development.
> >
> >> - Is this implementation / coding style approximately acceptable?
> >
> > It is all right. We do not have an "official" coding guideline for pocl (yet) but basically I tried to follow GNU Coding Standards, except in the LLVM passes where the LLVM guidelines are used.
> >
> > I have seen you call the C library functions to implement the functionality. This is ok for "native" environments, but in a real device scenario, there is not going to be an underlying C library providing for example "cos" function. One possible option is to use some other embedded library (newlib?) underneath our kernel library, thought I have the feeling this might be overkill and make too big device binaries. Otherwise we would need to implement the "cos", "sin", whatever... functions completely in the kernel library.
> >
> > Thoughts?
> 
> To get an efficient implementation, we will need to have something
> hardware-dependent; a cross-platform implementation can only be a
> fallback. It was my impression that POCL currently concentrates on
> running on the host, where a libc is always available.

POCL concentrates on static parallel architectures, really. Only the
host "driver" comes in the source code but POCL itself started as the
development of OpenCL support for TCE project (http://tce.cs.tut.fi). We
generalized the passes to make it portable, and more widely useful.

> For example, on Intel, one would want to use the fsin machine
> instruction, and there are similar machine instructions (or sequences
> thereof) for other architectures.
> 
> For the fallback implementation, I would definitely use an existing
> library. I don't know newlib, but it may be the way to go.

Of course. Even newlib would not be target independent, it is compiled
for a given architecture, and has its own set of processor dependent
implementations of math functions (about their level of efficiently I am
not sure). Agreed on the need for different library implementations.

> >> - Since some of the code is highly repetitive, should we use a
> >> templating mechanism, probably built onto m4?
> >
> > If it is *really* needed, then we can use it. But I am not sure how much effort it saves compared with the C preprocessor. If we want to make the system as portable as possible, it is desirable to keep dependencies to a minimum.
> >
> > Can provide a (simple) example of a case in the kernel library where M4 macros would help?
> 
> It could provide a mechanism that is not per-file, but more generic.
> For example, the implementations of sin, cos, tan, etc. are very
> similar. In other words, OpenCL doesn't have #include (or does it when
> the kernel library is built?), and using only #define leads to a lot
> of duplication across source files.

We have #include and #define and all of those during library building
process. I know sin/cos and other math functions are quite repetitive,
my doubt is whether the amount of "saved work" with m4 over plain C
preprocessor justifies the dependency on m4.

For example cos/sin/tan can be the same C file using a generic FUNC
macro (lets say trig.inc) and:

#define FUNC cos
#include "tric.ing"

#define FUNC sin
#include "trig.inc"

...

Again, if there is a clear gain using m4, no problem, but i am unsure of
it really saving so much typing.

BR,

Carlos

Re: [Pocl-devel] Kernel library portability

From: Erik S. <esc...@pe...> - 2011-10-24 13:26:40

2011/10/24 Carlos Sánchez de La Lama <car...@ur...>:
> Hi all,
>
> I have been thinking about how to implement the kernel library for
> different devices, and some related issues.
>
> Right now, pocl flow goes like this:
>
>   Compilation: .cl to .bc (bytecode)
>                   |
>                   V
>      Linking with kerneĺ library
>                   |
>                   V
>           Fully inlining
>                   |
>                   V
> Workgroup creation (replicate workitems)
>                   |
>                   V
>        Device-dependant driver
>
> Workgroup creation needs to detect barriers, thats why it needs to be
> done after fully inlining (there can be barriers in a function called by
> the kernel, not in the kernel itself).
>
> One desirable thing is bytecode to be device independent as long as
> possible, until device driver if possible, so we do not have to store
> several binaries in the host (there might be some unavoidable
> dependencies, but I think given OpenCL restricted C support those will
> be minor). Then there are two possibilities:
>
> 1) Make kernel library runtime compatible with all devices. This was the
> planned approach, it can be done by selection the implementation for a
> device using runtime conditional (C-level ifs) instead of preprocessor
> ones (#if/#ifdefs). LLVM should then eliminate dead code when generating
> the final binary.

This does not allow hardware-dependent optimisations. For example,
certain function calls / assembler instructions are only available on
certain hardware, and lead to syntax errors on others.

Now, it would be nice if these were not necessary. This would require
providing them via LLVM, i.e. implementing the OpenCL run time (e.g.
sin, cos, sqrt, their vectorised versions, etc.) in LLVM instead of in
POCL.

That may be a good idea overall, since this would make these functions
available to a larger audience, and would simplify POCL. I don't know
whether the LLVM project would be open to such extensions, though -- I
have not checked.

-erik

-- 
Erik Schnetter <esc...@pe...>
http://www.cct.lsu.edu/~eschnett/
AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm...

Re: [Pocl-devel] Some run-time functions

From: Erik S. <esc...@pe...> - 2011-10-24 13:20:38

2011/10/24 Carlos Sánchez de La Lama <car...@ur...>:
> Hi Erik,
>
> thanks for your contributions. Some of the questions you ask are open for discussion, as the kernel library implementation is still in a really early stage of development.
>
>> - Is this implementation / coding style approximately acceptable?
>
> It is all right. We do not have an "official" coding guideline for pocl (yet) but basically I tried to follow GNU Coding Standards, except in the LLVM passes where the LLVM guidelines are used.
>
> I have seen you call the C library functions to implement the functionality. This is ok for "native" environments, but in a real device scenario, there is not going to be an underlying C library providing for example "cos" function. One possible option is to use some other embedded library (newlib?) underneath our kernel library, thought I have the feeling this might be overkill and make too big device binaries. Otherwise we would need to implement the "cos", "sin", whatever... functions completely in the kernel library.
>
> Thoughts?

To get an efficient implementation, we will need to have something
hardware-dependent; a cross-platform implementation can only be a
fallback. It was my impression that POCL currently concentrates on
running on the host, where a libc is always available.

For example, on Intel, one would want to use the fsin machine
instruction, and there are similar machine instructions (or sequences
thereof) for other architectures.

For the fallback implementation, I would definitely use an existing
library. I don't know newlib, but it may be the way to go.

>> - Since some of the code is highly repetitive, should we use a
>> templating mechanism, probably built onto m4?
>
> If it is *really* needed, then we can use it. But I am not sure how much effort it saves compared with the C preprocessor. If we want to make the system as portable as possible, it is desirable to keep dependencies to a minimum.
>
> Can provide a (simple) example of a case in the kernel library where M4 macros would help?

It could provide a mechanism that is not per-file, but more generic.
For example, the implementations of sin, cos, tan, etc. are very
similar. In other words, OpenCL doesn't have #include (or does it when
the kernel library is built?), and using only #define leads to a lot
of duplication across source files.

>> - I added explicitly vectorized functions e.g. for fabs or sqrt for
>> SSE architectures; is this acceptable?
>
> It is perfectly acceptable as long as the code works also in non-SSE architectures.

Very good!

-erik

>> - Should there be test cases for the run-time functions?
>
> There is no strict rule, but of course the most test cases the better ;)
>
> BR
>
> Carlos



-- 
Erik Schnetter <esc...@pe...>
http://www.cct.lsu.edu/~eschnett/
AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm...

[Pocl-devel] Kernel library portability

From: Carlos S. de La L. <car...@ur...> - 2011-10-24 10:46:00

Hi all,

I have been thinking about how to implement the kernel library for
different devices, and some related issues.

Right now, pocl flow goes like this:

   Compilation: .cl to .bc (bytecode)
                   |
                   V
      Linking with kerneĺ library
                   |
                   V
           Fully inlining
                   |
                   V
Workgroup creation (replicate workitems)
                   |
                   V
        Device-dependant driver

Workgroup creation needs to detect barriers, thats why it needs to be
done after fully inlining (there can be barriers in a function called by
the kernel, not in the kernel itself).

One desirable thing is bytecode to be device independent as long as
possible, until device driver if possible, so we do not have to store
several binaries in the host (there might be some unavoidable
dependencies, but I think given OpenCL restricted C support those will
be minor). Then there are two possibilities:

1) Make kernel library runtime compatible with all devices. This was the
planned approach, it can be done by selection the implementation for a
device using runtime conditional (C-level ifs) instead of preprocessor
ones (#if/#ifdefs). LLVM should then eliminate dead code when generating
the final binary.

2) Perform inlining and replication before linking. Only a minor part of
the kernel library (get_xxx_id() and friends) need to be linked before
WG creation, and those are going to be common for all device because
they depend on replication passes. But the big "functional" kernel
runtime library could be linked later, even in device-dependant binary
form instead of bytecode form, allowing the use of different kernel
libraries for different devices. This would have the additional
advantage of smaller bytecode and faster code generation.

Thoughs?

Carlos

Re: [Pocl-devel] Some run-time functions

From: Carlos S. de La L. <car...@ur...> - 2011-10-24 09:05:42

Hi Erik,

thanks for your contributions. Some of the questions you ask are open for discussion, as the kernel library implementation is still in a really early stage of development.

> - Is this implementation / coding style approximately acceptable?

It is all right. We do not have an "official" coding guideline for pocl (yet) but basically I tried to follow GNU Coding Standards, except in the LLVM passes where the LLVM guidelines are used.

I have seen you call the C library functions to implement the functionality. This is ok for "native" environments, but in a real device scenario, there is not going to be an underlying C library providing for example "cos" function. One possible option is to use some other embedded library (newlib?) underneath our kernel library, thought I have the feeling this might be overkill and make too big device binaries. Otherwise we would need to implement the "cos", "sin", whatever... functions completely in the kernel library.

Thoughts?

> - Since some of the code is highly repetitive, should we use a
> templating mechanism, probably built onto m4?

If it is *really* needed, then we can use it. But I am not sure how much effort it saves compared with the C preprocessor. If we want to make the system as portable as possible, it is desirable to keep dependencies to a minimum.

Can provide a (simple) example of a case in the kernel library where M4 macros would help?

> - I added explicitly vectorized functions e.g. for fabs or sqrt for
> SSE architectures; is this acceptable?

It is perfectly acceptable as long as the code works also in non-SSE architectures.

> - Should there be test cases for the run-time functions?

There is no strict rule, but of course the most test cases the better ;)

BR

Carlos

[Pocl-devel] Some run-time functions

From: Erik S. <esc...@pe...> - 2011-10-23 22:27:52

I implemented some run-time functions, and push this onto a branch
<bzr+ssh://bazaar.launchpad.net/~schnetter/pocl/main>. Before I go
further with implementing run-time functions, I am looking for some
feedback.

- Is this implementation / coding style approximately acceptable?
- Since some of the code is highly repetitive, should we use a
templating mechanism, probably built onto m4?
- I added explicitly vectorized functions e.g. for fabs or sqrt for
SSE architectures; is this acceptable?
- Should there be test cases for the run-time functions?

-erik

-- 
Erik Schnetter <esc...@pe...>
http://www.cct.lsu.edu/~eschnett/
AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm...

[Pocl-devel] threadsafe branch

From: Carlos S. de La L. <car...@ur...> - 2011-10-21 14:00:05

Hi all,

I just pushed a new branch with my work on making work groups
thread-safe.

Main changes are for standalone targets (TCE). New interface is as
follows:

struct _pocl_context {
        cl_uint work_dim;
        cl_uint num_groups[3];
        cl_uint group_id[3];
        cl_uint global_offset[3];
    }

void _<kernel>_workgroup (void *args[], struct _pocl_context *pc);

"args" is an array with pointers to the correct kernel argument values
(no changes there), and the new struct gives all the values the kernel
runtime needs, at least so far. Should be easier to extend it in the
future if needed, so no more interface changes will be made (hopefully).

You can switch to that branch to start using the new interface. It will
be merged into trunk once thread-safe is completed.

BR

Carlos

5 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 54 55 56 (Page 56 of 56)