Re: [Pocl-devel] Multithreading support commited

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On 10/30/2011 05:38 PM, Erik Schnetter wrote:
> Another option could be to build a small C program that uses OpenMP;
> the OpenMP run time contains logic that determines a good number of
> threads to use. You would look at omp_max_threads().

I wouldn't like to introduce a library dependency just because of this.
I'm sure there are OS-specific ways to figure out the count of cores and
hardware threads per core in the different operating systems. Or just
resort to some CPU info instruction set in the device, if available.

After all, the current need of pocl is quite simple: if we want to exploit
the task level parallelism provided by the device to the max while minimizing
the threading overheads, it boils down to the number of hardware threads
per core times the core count (or the number of WGs, whichever is smaller),
doesn't it?

If disk or network I/O was of concern there should be additional threads to
hide the I/O latencies (at the OS level), but now we are mainly concerned on
hiding the memory latencies because the kernels do not access files or the
network like, for example, OpenMP loops in general can do. For memory latency
hiding, only hardware threads can be of help, AFAIK.

Additional consideration is the size of the local memory as each
parallel WG needs a separate local memory space. Currently pocl just
assumes the local memory malloc overhead (and the size) per thread is
tolerable. In reality, for example on memory-tight embedded targets, this
should also restrict the max number of parallel WG threads. If you can afford
only one local memory "alive" at the same time, you can launch only one
WG thread.

BR,
-- 
--Pekka