From: Erik S. <esc...@pe...> - 2011-10-23 22:27:52
|
I implemented some run-time functions, and push this onto a branch <bzr+ssh://bazaar.launchpad.net/~schnetter/pocl/main>. Before I go further with implementing run-time functions, I am looking for some feedback. - Is this implementation / coding style approximately acceptable? - Since some of the code is highly repetitive, should we use a templating mechanism, probably built onto m4? - I added explicitly vectorized functions e.g. for fabs or sqrt for SSE architectures; is this acceptable? - Should there be test cases for the run-time functions? -erik -- Erik Schnetter <esc...@pe...> http://www.cct.lsu.edu/~eschnett/ AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... |
From: Carlos S. de La L. <car...@ur...> - 2011-10-24 09:05:42
|
Hi Erik, thanks for your contributions. Some of the questions you ask are open for discussion, as the kernel library implementation is still in a really early stage of development. > - Is this implementation / coding style approximately acceptable? It is all right. We do not have an "official" coding guideline for pocl (yet) but basically I tried to follow GNU Coding Standards, except in the LLVM passes where the LLVM guidelines are used. I have seen you call the C library functions to implement the functionality. This is ok for "native" environments, but in a real device scenario, there is not going to be an underlying C library providing for example "cos" function. One possible option is to use some other embedded library (newlib?) underneath our kernel library, thought I have the feeling this might be overkill and make too big device binaries. Otherwise we would need to implement the "cos", "sin", whatever... functions completely in the kernel library. Thoughts? > - Since some of the code is highly repetitive, should we use a > templating mechanism, probably built onto m4? If it is *really* needed, then we can use it. But I am not sure how much effort it saves compared with the C preprocessor. If we want to make the system as portable as possible, it is desirable to keep dependencies to a minimum. Can provide a (simple) example of a case in the kernel library where M4 macros would help? > - I added explicitly vectorized functions e.g. for fabs or sqrt for > SSE architectures; is this acceptable? It is perfectly acceptable as long as the code works also in non-SSE architectures. > - Should there be test cases for the run-time functions? There is no strict rule, but of course the most test cases the better ;) BR Carlos |
From: Erik S. <esc...@pe...> - 2011-10-24 13:20:38
|
2011/10/24 Carlos Sánchez de La Lama <car...@ur...>: > Hi Erik, > > thanks for your contributions. Some of the questions you ask are open for discussion, as the kernel library implementation is still in a really early stage of development. > >> - Is this implementation / coding style approximately acceptable? > > It is all right. We do not have an "official" coding guideline for pocl (yet) but basically I tried to follow GNU Coding Standards, except in the LLVM passes where the LLVM guidelines are used. > > I have seen you call the C library functions to implement the functionality. This is ok for "native" environments, but in a real device scenario, there is not going to be an underlying C library providing for example "cos" function. One possible option is to use some other embedded library (newlib?) underneath our kernel library, thought I have the feeling this might be overkill and make too big device binaries. Otherwise we would need to implement the "cos", "sin", whatever... functions completely in the kernel library. > > Thoughts? To get an efficient implementation, we will need to have something hardware-dependent; a cross-platform implementation can only be a fallback. It was my impression that POCL currently concentrates on running on the host, where a libc is always available. For example, on Intel, one would want to use the fsin machine instruction, and there are similar machine instructions (or sequences thereof) for other architectures. For the fallback implementation, I would definitely use an existing library. I don't know newlib, but it may be the way to go. >> - Since some of the code is highly repetitive, should we use a >> templating mechanism, probably built onto m4? > > If it is *really* needed, then we can use it. But I am not sure how much effort it saves compared with the C preprocessor. If we want to make the system as portable as possible, it is desirable to keep dependencies to a minimum. > > Can provide a (simple) example of a case in the kernel library where M4 macros would help? It could provide a mechanism that is not per-file, but more generic. For example, the implementations of sin, cos, tan, etc. are very similar. In other words, OpenCL doesn't have #include (or does it when the kernel library is built?), and using only #define leads to a lot of duplication across source files. >> - I added explicitly vectorized functions e.g. for fabs or sqrt for >> SSE architectures; is this acceptable? > > It is perfectly acceptable as long as the code works also in non-SSE architectures. Very good! -erik >> - Should there be test cases for the run-time functions? > > There is no strict rule, but of course the most test cases the better ;) > > BR > > Carlos -- Erik Schnetter <esc...@pe...> http://www.cct.lsu.edu/~eschnett/ AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... |
From: Carlos S. de La L. <car...@ur...> - 2011-10-24 13:48:16
|
Hi Erik, On Mon, 2011-10-24 at 09:20 -0400, Erik Schnetter wrote: > 2011/10/24 Carlos Sánchez de La Lama <car...@ur...>: > > Hi Erik, > > > > thanks for your contributions. Some of the questions you ask are open for discussion, as the kernel library implementation is still in a really early stage of development. > > > >> - Is this implementation / coding style approximately acceptable? > > > > It is all right. We do not have an "official" coding guideline for pocl (yet) but basically I tried to follow GNU Coding Standards, except in the LLVM passes where the LLVM guidelines are used. > > > > I have seen you call the C library functions to implement the functionality. This is ok for "native" environments, but in a real device scenario, there is not going to be an underlying C library providing for example "cos" function. One possible option is to use some other embedded library (newlib?) underneath our kernel library, thought I have the feeling this might be overkill and make too big device binaries. Otherwise we would need to implement the "cos", "sin", whatever... functions completely in the kernel library. > > > > Thoughts? > > To get an efficient implementation, we will need to have something > hardware-dependent; a cross-platform implementation can only be a > fallback. It was my impression that POCL currently concentrates on > running on the host, where a libc is always available. POCL concentrates on static parallel architectures, really. Only the host "driver" comes in the source code but POCL itself started as the development of OpenCL support for TCE project (http://tce.cs.tut.fi). We generalized the passes to make it portable, and more widely useful. > For example, on Intel, one would want to use the fsin machine > instruction, and there are similar machine instructions (or sequences > thereof) for other architectures. > > For the fallback implementation, I would definitely use an existing > library. I don't know newlib, but it may be the way to go. Of course. Even newlib would not be target independent, it is compiled for a given architecture, and has its own set of processor dependent implementations of math functions (about their level of efficiently I am not sure). Agreed on the need for different library implementations. > >> - Since some of the code is highly repetitive, should we use a > >> templating mechanism, probably built onto m4? > > > > If it is *really* needed, then we can use it. But I am not sure how much effort it saves compared with the C preprocessor. If we want to make the system as portable as possible, it is desirable to keep dependencies to a minimum. > > > > Can provide a (simple) example of a case in the kernel library where M4 macros would help? > > It could provide a mechanism that is not per-file, but more generic. > For example, the implementations of sin, cos, tan, etc. are very > similar. In other words, OpenCL doesn't have #include (or does it when > the kernel library is built?), and using only #define leads to a lot > of duplication across source files. We have #include and #define and all of those during library building process. I know sin/cos and other math functions are quite repetitive, my doubt is whether the amount of "saved work" with m4 over plain C preprocessor justifies the dependency on m4. For example cos/sin/tan can be the same C file using a generic FUNC macro (lets say trig.inc) and: #define FUNC cos #include "tric.ing" #define FUNC sin #include "trig.inc" ... Again, if there is a clear gain using m4, no problem, but i am unsure of it really saving so much typing. BR, Carlos |
From: Erik S. <esc...@pe...> - 2011-10-24 14:03:31
|
2011/10/24 Carlos Sánchez de La Lama <car...@ur...>: > Hi Erik, > > On Mon, 2011-10-24 at 09:20 -0400, Erik Schnetter wrote: >> 2011/10/24 Carlos Sánchez de La Lama <car...@ur...>: >> > Hi Erik, >> > >> > thanks for your contributions. Some of the questions you ask are open for discussion, as the kernel library implementation is still in a really early stage of development. >> > >> >> - Is this implementation / coding style approximately acceptable? >> > >> > It is all right. We do not have an "official" coding guideline for pocl (yet) but basically I tried to follow GNU Coding Standards, except in the LLVM passes where the LLVM guidelines are used. >> > >> > I have seen you call the C library functions to implement the functionality. This is ok for "native" environments, but in a real device scenario, there is not going to be an underlying C library providing for example "cos" function. One possible option is to use some other embedded library (newlib?) underneath our kernel library, thought I have the feeling this might be overkill and make too big device binaries. Otherwise we would need to implement the "cos", "sin", whatever... functions completely in the kernel library. >> > >> > Thoughts? >> >> To get an efficient implementation, we will need to have something >> hardware-dependent; a cross-platform implementation can only be a >> fallback. It was my impression that POCL currently concentrates on >> running on the host, where a libc is always available. > > POCL concentrates on static parallel architectures, really. Only the > host "driver" comes in the source code but POCL itself started as the > development of OpenCL support for TCE project (http://tce.cs.tut.fi). We > generalized the passes to make it portable, and more widely useful. > >> For example, on Intel, one would want to use the fsin machine >> instruction, and there are similar machine instructions (or sequences >> thereof) for other architectures. >> >> For the fallback implementation, I would definitely use an existing >> library. I don't know newlib, but it may be the way to go. > > Of course. Even newlib would not be target independent, it is compiled > for a given architecture, and has its own set of processor dependent > implementations of math functions (about their level of efficiently I am > not sure). Agreed on the need for different library implementations. > >> >> - Since some of the code is highly repetitive, should we use a >> >> templating mechanism, probably built onto m4? >> > >> > If it is *really* needed, then we can use it. But I am not sure how much effort it saves compared with the C preprocessor. If we want to make the system as portable as possible, it is desirable to keep dependencies to a minimum. >> > >> > Can provide a (simple) example of a case in the kernel library where M4 macros would help? >> >> It could provide a mechanism that is not per-file, but more generic. >> For example, the implementations of sin, cos, tan, etc. are very >> similar. In other words, OpenCL doesn't have #include (or does it when >> the kernel library is built?), and using only #define leads to a lot >> of duplication across source files. > > We have #include and #define and all of those during library building > process. I know sin/cos and other math functions are quite repetitive, > my doubt is whether the amount of "saved work" with m4 over plain C > preprocessor justifies the dependency on m4. > > For example cos/sin/tan can be the same C file using a generic FUNC > macro (lets say trig.inc) and: > > #define FUNC cos > #include "tric.ing" > > #define FUNC sin > #include "trig.inc" > > ... > > Again, if there is a clear gain using m4, no problem, but i am unsure of > it really saving so much typing. If there's #include, I prefer cpp as well. -erik -- Erik Schnetter <esc...@pe...> http://www.cct.lsu.edu/~eschnett/ AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... |