You can subscribe to this list here.
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(25) |
Nov
(11) |
Dec
(36) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2012 |
Jan
(30) |
Feb
(4) |
Mar
(4) |
Apr
(7) |
May
(5) |
Jun
(31) |
Jul
(6) |
Aug
(19) |
Sep
(38) |
Oct
(30) |
Nov
(22) |
Dec
(19) |
2013 |
Jan
(55) |
Feb
(39) |
Mar
(77) |
Apr
(10) |
May
(83) |
Jun
(52) |
Jul
(86) |
Aug
(61) |
Sep
(29) |
Oct
(9) |
Nov
(38) |
Dec
(22) |
2014 |
Jan
(14) |
Feb
(29) |
Mar
(4) |
Apr
(19) |
May
(3) |
Jun
(27) |
Jul
(6) |
Aug
(5) |
Sep
(3) |
Oct
(48) |
Nov
|
Dec
(5) |
2015 |
Jan
(8) |
Feb
(2) |
Mar
(8) |
Apr
(16) |
May
|
Jun
|
Jul
(2) |
Aug
(1) |
Sep
(2) |
Oct
(13) |
Nov
(5) |
Dec
(2) |
2016 |
Jan
(26) |
Feb
(6) |
Mar
(8) |
Apr
(8) |
May
(2) |
Jun
|
Jul
|
Aug
(11) |
Sep
(3) |
Oct
(5) |
Nov
(14) |
Dec
(2) |
2017 |
Jan
(16) |
Feb
(4) |
Mar
(11) |
Apr
(4) |
May
(5) |
Jun
(5) |
Jul
(3) |
Aug
|
Sep
(6) |
Oct
|
Nov
(10) |
Dec
(6) |
2018 |
Jan
|
Feb
(21) |
Mar
(11) |
Apr
(3) |
May
(2) |
Jun
(8) |
Jul
|
Aug
(13) |
Sep
(6) |
Oct
(2) |
Nov
|
Dec
(11) |
2019 |
Jan
|
Feb
(5) |
Mar
(10) |
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
(10) |
Oct
(4) |
Nov
|
Dec
|
2020 |
Jan
|
Feb
|
Mar
(1) |
Apr
(4) |
May
|
Jun
|
Jul
(3) |
Aug
|
Sep
(3) |
Oct
|
Nov
|
Dec
(4) |
2021 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
(1) |
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(4) |
Nov
|
Dec
|
2022 |
Jan
|
Feb
|
Mar
(4) |
Apr
|
May
(11) |
Jun
(1) |
Jul
(3) |
Aug
|
Sep
(1) |
Oct
|
Nov
(2) |
Dec
(1) |
2023 |
Jan
(4) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(2) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
From: Erik S. <esc...@pe...> - 2011-10-24 14:03:31
|
2011/10/24 Carlos Sánchez de La Lama <car...@ur...>: > Hi Erik, > > On Mon, 2011-10-24 at 09:20 -0400, Erik Schnetter wrote: >> 2011/10/24 Carlos Sánchez de La Lama <car...@ur...>: >> > Hi Erik, >> > >> > thanks for your contributions. Some of the questions you ask are open for discussion, as the kernel library implementation is still in a really early stage of development. >> > >> >> - Is this implementation / coding style approximately acceptable? >> > >> > It is all right. We do not have an "official" coding guideline for pocl (yet) but basically I tried to follow GNU Coding Standards, except in the LLVM passes where the LLVM guidelines are used. >> > >> > I have seen you call the C library functions to implement the functionality. This is ok for "native" environments, but in a real device scenario, there is not going to be an underlying C library providing for example "cos" function. One possible option is to use some other embedded library (newlib?) underneath our kernel library, thought I have the feeling this might be overkill and make too big device binaries. Otherwise we would need to implement the "cos", "sin", whatever... functions completely in the kernel library. >> > >> > Thoughts? >> >> To get an efficient implementation, we will need to have something >> hardware-dependent; a cross-platform implementation can only be a >> fallback. It was my impression that POCL currently concentrates on >> running on the host, where a libc is always available. > > POCL concentrates on static parallel architectures, really. Only the > host "driver" comes in the source code but POCL itself started as the > development of OpenCL support for TCE project (http://tce.cs.tut.fi). We > generalized the passes to make it portable, and more widely useful. > >> For example, on Intel, one would want to use the fsin machine >> instruction, and there are similar machine instructions (or sequences >> thereof) for other architectures. >> >> For the fallback implementation, I would definitely use an existing >> library. I don't know newlib, but it may be the way to go. > > Of course. Even newlib would not be target independent, it is compiled > for a given architecture, and has its own set of processor dependent > implementations of math functions (about their level of efficiently I am > not sure). Agreed on the need for different library implementations. > >> >> - Since some of the code is highly repetitive, should we use a >> >> templating mechanism, probably built onto m4? >> > >> > If it is *really* needed, then we can use it. But I am not sure how much effort it saves compared with the C preprocessor. If we want to make the system as portable as possible, it is desirable to keep dependencies to a minimum. >> > >> > Can provide a (simple) example of a case in the kernel library where M4 macros would help? >> >> It could provide a mechanism that is not per-file, but more generic. >> For example, the implementations of sin, cos, tan, etc. are very >> similar. In other words, OpenCL doesn't have #include (or does it when >> the kernel library is built?), and using only #define leads to a lot >> of duplication across source files. > > We have #include and #define and all of those during library building > process. I know sin/cos and other math functions are quite repetitive, > my doubt is whether the amount of "saved work" with m4 over plain C > preprocessor justifies the dependency on m4. > > For example cos/sin/tan can be the same C file using a generic FUNC > macro (lets say trig.inc) and: > > #define FUNC cos > #include "tric.ing" > > #define FUNC sin > #include "trig.inc" > > ... > > Again, if there is a clear gain using m4, no problem, but i am unsure of > it really saving so much typing. If there's #include, I prefer cpp as well. -erik -- Erik Schnetter <esc...@pe...> http://www.cct.lsu.edu/~eschnett/ AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... |
From: Pekka J. <pek...@tu...> - 2011-10-24 14:02:13
|
Hi, I started a new IRC channel in the irc.oftc.net network called #pocl for discussing pocl issues in real time. See you there! -- Pekka |
From: Carlos S. de La L. <car...@ur...> - 2011-10-24 13:48:16
|
Hi Erik, On Mon, 2011-10-24 at 09:20 -0400, Erik Schnetter wrote: > 2011/10/24 Carlos Sánchez de La Lama <car...@ur...>: > > Hi Erik, > > > > thanks for your contributions. Some of the questions you ask are open for discussion, as the kernel library implementation is still in a really early stage of development. > > > >> - Is this implementation / coding style approximately acceptable? > > > > It is all right. We do not have an "official" coding guideline for pocl (yet) but basically I tried to follow GNU Coding Standards, except in the LLVM passes where the LLVM guidelines are used. > > > > I have seen you call the C library functions to implement the functionality. This is ok for "native" environments, but in a real device scenario, there is not going to be an underlying C library providing for example "cos" function. One possible option is to use some other embedded library (newlib?) underneath our kernel library, thought I have the feeling this might be overkill and make too big device binaries. Otherwise we would need to implement the "cos", "sin", whatever... functions completely in the kernel library. > > > > Thoughts? > > To get an efficient implementation, we will need to have something > hardware-dependent; a cross-platform implementation can only be a > fallback. It was my impression that POCL currently concentrates on > running on the host, where a libc is always available. POCL concentrates on static parallel architectures, really. Only the host "driver" comes in the source code but POCL itself started as the development of OpenCL support for TCE project (http://tce.cs.tut.fi). We generalized the passes to make it portable, and more widely useful. > For example, on Intel, one would want to use the fsin machine > instruction, and there are similar machine instructions (or sequences > thereof) for other architectures. > > For the fallback implementation, I would definitely use an existing > library. I don't know newlib, but it may be the way to go. Of course. Even newlib would not be target independent, it is compiled for a given architecture, and has its own set of processor dependent implementations of math functions (about their level of efficiently I am not sure). Agreed on the need for different library implementations. > >> - Since some of the code is highly repetitive, should we use a > >> templating mechanism, probably built onto m4? > > > > If it is *really* needed, then we can use it. But I am not sure how much effort it saves compared with the C preprocessor. If we want to make the system as portable as possible, it is desirable to keep dependencies to a minimum. > > > > Can provide a (simple) example of a case in the kernel library where M4 macros would help? > > It could provide a mechanism that is not per-file, but more generic. > For example, the implementations of sin, cos, tan, etc. are very > similar. In other words, OpenCL doesn't have #include (or does it when > the kernel library is built?), and using only #define leads to a lot > of duplication across source files. We have #include and #define and all of those during library building process. I know sin/cos and other math functions are quite repetitive, my doubt is whether the amount of "saved work" with m4 over plain C preprocessor justifies the dependency on m4. For example cos/sin/tan can be the same C file using a generic FUNC macro (lets say trig.inc) and: #define FUNC cos #include "tric.ing" #define FUNC sin #include "trig.inc" ... Again, if there is a clear gain using m4, no problem, but i am unsure of it really saving so much typing. BR, Carlos |
From: Erik S. <esc...@pe...> - 2011-10-24 13:26:40
|
2011/10/24 Carlos Sánchez de La Lama <car...@ur...>: > Hi all, > > I have been thinking about how to implement the kernel library for > different devices, and some related issues. > > Right now, pocl flow goes like this: > > Compilation: .cl to .bc (bytecode) > | > V > Linking with kerneĺ library > | > V > Fully inlining > | > V > Workgroup creation (replicate workitems) > | > V > Device-dependant driver > > Workgroup creation needs to detect barriers, thats why it needs to be > done after fully inlining (there can be barriers in a function called by > the kernel, not in the kernel itself). > > One desirable thing is bytecode to be device independent as long as > possible, until device driver if possible, so we do not have to store > several binaries in the host (there might be some unavoidable > dependencies, but I think given OpenCL restricted C support those will > be minor). Then there are two possibilities: > > 1) Make kernel library runtime compatible with all devices. This was the > planned approach, it can be done by selection the implementation for a > device using runtime conditional (C-level ifs) instead of preprocessor > ones (#if/#ifdefs). LLVM should then eliminate dead code when generating > the final binary. This does not allow hardware-dependent optimisations. For example, certain function calls / assembler instructions are only available on certain hardware, and lead to syntax errors on others. Now, it would be nice if these were not necessary. This would require providing them via LLVM, i.e. implementing the OpenCL run time (e.g. sin, cos, sqrt, their vectorised versions, etc.) in LLVM instead of in POCL. That may be a good idea overall, since this would make these functions available to a larger audience, and would simplify POCL. I don't know whether the LLVM project would be open to such extensions, though -- I have not checked. -erik -- Erik Schnetter <esc...@pe...> http://www.cct.lsu.edu/~eschnett/ AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... |
From: Erik S. <esc...@pe...> - 2011-10-24 13:20:38
|
2011/10/24 Carlos Sánchez de La Lama <car...@ur...>: > Hi Erik, > > thanks for your contributions. Some of the questions you ask are open for discussion, as the kernel library implementation is still in a really early stage of development. > >> - Is this implementation / coding style approximately acceptable? > > It is all right. We do not have an "official" coding guideline for pocl (yet) but basically I tried to follow GNU Coding Standards, except in the LLVM passes where the LLVM guidelines are used. > > I have seen you call the C library functions to implement the functionality. This is ok for "native" environments, but in a real device scenario, there is not going to be an underlying C library providing for example "cos" function. One possible option is to use some other embedded library (newlib?) underneath our kernel library, thought I have the feeling this might be overkill and make too big device binaries. Otherwise we would need to implement the "cos", "sin", whatever... functions completely in the kernel library. > > Thoughts? To get an efficient implementation, we will need to have something hardware-dependent; a cross-platform implementation can only be a fallback. It was my impression that POCL currently concentrates on running on the host, where a libc is always available. For example, on Intel, one would want to use the fsin machine instruction, and there are similar machine instructions (or sequences thereof) for other architectures. For the fallback implementation, I would definitely use an existing library. I don't know newlib, but it may be the way to go. >> - Since some of the code is highly repetitive, should we use a >> templating mechanism, probably built onto m4? > > If it is *really* needed, then we can use it. But I am not sure how much effort it saves compared with the C preprocessor. If we want to make the system as portable as possible, it is desirable to keep dependencies to a minimum. > > Can provide a (simple) example of a case in the kernel library where M4 macros would help? It could provide a mechanism that is not per-file, but more generic. For example, the implementations of sin, cos, tan, etc. are very similar. In other words, OpenCL doesn't have #include (or does it when the kernel library is built?), and using only #define leads to a lot of duplication across source files. >> - I added explicitly vectorized functions e.g. for fabs or sqrt for >> SSE architectures; is this acceptable? > > It is perfectly acceptable as long as the code works also in non-SSE architectures. Very good! -erik >> - Should there be test cases for the run-time functions? > > There is no strict rule, but of course the most test cases the better ;) > > BR > > Carlos -- Erik Schnetter <esc...@pe...> http://www.cct.lsu.edu/~eschnett/ AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... |
From: Carlos S. de La L. <car...@ur...> - 2011-10-24 10:46:00
|
Hi all, I have been thinking about how to implement the kernel library for different devices, and some related issues. Right now, pocl flow goes like this: Compilation: .cl to .bc (bytecode) | V Linking with kerneĺ library | V Fully inlining | V Workgroup creation (replicate workitems) | V Device-dependant driver Workgroup creation needs to detect barriers, thats why it needs to be done after fully inlining (there can be barriers in a function called by the kernel, not in the kernel itself). One desirable thing is bytecode to be device independent as long as possible, until device driver if possible, so we do not have to store several binaries in the host (there might be some unavoidable dependencies, but I think given OpenCL restricted C support those will be minor). Then there are two possibilities: 1) Make kernel library runtime compatible with all devices. This was the planned approach, it can be done by selection the implementation for a device using runtime conditional (C-level ifs) instead of preprocessor ones (#if/#ifdefs). LLVM should then eliminate dead code when generating the final binary. 2) Perform inlining and replication before linking. Only a minor part of the kernel library (get_xxx_id() and friends) need to be linked before WG creation, and those are going to be common for all device because they depend on replication passes. But the big "functional" kernel runtime library could be linked later, even in device-dependant binary form instead of bytecode form, allowing the use of different kernel libraries for different devices. This would have the additional advantage of smaller bytecode and faster code generation. Thoughs? Carlos |
From: Carlos S. de La L. <car...@ur...> - 2011-10-24 09:05:42
|
Hi Erik, thanks for your contributions. Some of the questions you ask are open for discussion, as the kernel library implementation is still in a really early stage of development. > - Is this implementation / coding style approximately acceptable? It is all right. We do not have an "official" coding guideline for pocl (yet) but basically I tried to follow GNU Coding Standards, except in the LLVM passes where the LLVM guidelines are used. I have seen you call the C library functions to implement the functionality. This is ok for "native" environments, but in a real device scenario, there is not going to be an underlying C library providing for example "cos" function. One possible option is to use some other embedded library (newlib?) underneath our kernel library, thought I have the feeling this might be overkill and make too big device binaries. Otherwise we would need to implement the "cos", "sin", whatever... functions completely in the kernel library. Thoughts? > - Since some of the code is highly repetitive, should we use a > templating mechanism, probably built onto m4? If it is *really* needed, then we can use it. But I am not sure how much effort it saves compared with the C preprocessor. If we want to make the system as portable as possible, it is desirable to keep dependencies to a minimum. Can provide a (simple) example of a case in the kernel library where M4 macros would help? > - I added explicitly vectorized functions e.g. for fabs or sqrt for > SSE architectures; is this acceptable? It is perfectly acceptable as long as the code works also in non-SSE architectures. > - Should there be test cases for the run-time functions? There is no strict rule, but of course the most test cases the better ;) BR Carlos |
From: Erik S. <esc...@pe...> - 2011-10-23 22:27:52
|
I implemented some run-time functions, and push this onto a branch <bzr+ssh://bazaar.launchpad.net/~schnetter/pocl/main>. Before I go further with implementing run-time functions, I am looking for some feedback. - Is this implementation / coding style approximately acceptable? - Since some of the code is highly repetitive, should we use a templating mechanism, probably built onto m4? - I added explicitly vectorized functions e.g. for fabs or sqrt for SSE architectures; is this acceptable? - Should there be test cases for the run-time functions? -erik -- Erik Schnetter <esc...@pe...> http://www.cct.lsu.edu/~eschnett/ AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... |
From: Carlos S. de La L. <car...@ur...> - 2011-10-21 14:00:05
|
Hi all, I just pushed a new branch with my work on making work groups thread-safe. Main changes are for standalone targets (TCE). New interface is as follows: struct _pocl_context { cl_uint work_dim; cl_uint num_groups[3]; cl_uint group_id[3]; cl_uint global_offset[3]; } void _<kernel>_workgroup (void *args[], struct _pocl_context *pc); "args" is an array with pointers to the correct kernel argument values (no changes there), and the new struct gives all the values the kernel runtime needs, at least so far. Should be easier to extend it in the future if needed, so no more interface changes will be made (hopefully). You can switch to that branch to start using the new interface. It will be merged into trunk once thread-safe is completed. BR Carlos |