From: Erik S. <esc...@pe...> - 2011-10-24 13:20:38
|
2011/10/24 Carlos Sánchez de La Lama <car...@ur...>: > Hi Erik, > > thanks for your contributions. Some of the questions you ask are open for discussion, as the kernel library implementation is still in a really early stage of development. > >> - Is this implementation / coding style approximately acceptable? > > It is all right. We do not have an "official" coding guideline for pocl (yet) but basically I tried to follow GNU Coding Standards, except in the LLVM passes where the LLVM guidelines are used. > > I have seen you call the C library functions to implement the functionality. This is ok for "native" environments, but in a real device scenario, there is not going to be an underlying C library providing for example "cos" function. One possible option is to use some other embedded library (newlib?) underneath our kernel library, thought I have the feeling this might be overkill and make too big device binaries. Otherwise we would need to implement the "cos", "sin", whatever... functions completely in the kernel library. > > Thoughts? To get an efficient implementation, we will need to have something hardware-dependent; a cross-platform implementation can only be a fallback. It was my impression that POCL currently concentrates on running on the host, where a libc is always available. For example, on Intel, one would want to use the fsin machine instruction, and there are similar machine instructions (or sequences thereof) for other architectures. For the fallback implementation, I would definitely use an existing library. I don't know newlib, but it may be the way to go. >> - Since some of the code is highly repetitive, should we use a >> templating mechanism, probably built onto m4? > > If it is *really* needed, then we can use it. But I am not sure how much effort it saves compared with the C preprocessor. If we want to make the system as portable as possible, it is desirable to keep dependencies to a minimum. > > Can provide a (simple) example of a case in the kernel library where M4 macros would help? It could provide a mechanism that is not per-file, but more generic. For example, the implementations of sin, cos, tan, etc. are very similar. In other words, OpenCL doesn't have #include (or does it when the kernel library is built?), and using only #define leads to a lot of duplication across source files. >> - I added explicitly vectorized functions e.g. for fabs or sqrt for >> SSE architectures; is this acceptable? > > It is perfectly acceptable as long as the code works also in non-SSE architectures. Very good! -erik >> - Should there be test cases for the run-time functions? > > There is no strict rule, but of course the most test cases the better ;) > > BR > > Carlos -- Erik Schnetter <esc...@pe...> http://www.cct.lsu.edu/~eschnett/ AIM: eschnett247, Skype: eschnett, Google Talk: sch...@gm... |