Re: [Algorithms] General purpose task parallel threading approach

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, Apr 7, 2009 at 7:35 PM, Jarkko Lempiainen <al...@gm...> wrote:

>  Yes, your prioritization would still function properly if you would
> assign a thread per priority of course, but it wouldn’t result in optimal
> throughput, what’s the point after all. You need all your worker threads to
> work on the top priority tasks in order to have optimal throughput, which
> requires that all the threads work on the single prioritized queue of jobs.
> Job stealing doesn’t achieve this because each worker thread has its own
> queue. I also never said that tasks should be interrupted prematurely, but
> you should let them finish and start the next high priority task after.
>

I didn't say an individual task would be interrupted, but if you have 100
tasks working on something and you kick off 20 tasks to do your animation at
a high priority when only 5 of those 100 tasks have completed, you'll evict
a bunch of data from the cache and then you still have 95 tasks to go, then
you do another 5 and something else pops up to the top evicting the data
again, and so on. It's better for the entire batch of 100 tasks to run to
completion. With work stealing large bathches of tasks that would benefit
from sharing data across cores would naturally start to fill up the worker
threads as the previous tasks finish - and if you really want to make sure
that's the case, just call sync() before.

> Having fine granularity tasks doesn’t help regardless how many threads you
> have. Only thing you really care is that your jobs take let say <~10% of
> your frame time (1-3ms), which isn’t very high granularity. Some might take
> longer, but they are also prioritized higher to avoid stalling other threads
> in the end of the frame. So if you can animate all your on-screen characters
> in 1-3ms you shouldn’t bother multi-threading the code EXCEPT if you want to
> gain shared cache coherence. However, with work stealing you wouldn’t even
> gain this because each thread works on its own set of jobs.
>

I'm not sure I follow you? If I can take something that takes 3ms to finish
and get it done in 0.5ms because I can use six threads instead of one then
that's clearly a win. Even if I get an unreasonable amount of overhead and
it takes 1ms (50% overhead!) it's still a 3x boost as far as the overall
frame time.
What you care about is that your set of tasks that you need to complete in a
frame is done as quickly as possible, which means that it should waste as
few cycles as possible, which means you should avoid contention, bubbles due
to coarse granularity, and cache thrashing.
Fine task granularity is clearly extremely important in order to actually be
able to loads of threads. Of course you need to make sure the overhead of
starting the thread isn't higher than the cost of just doing the work
directly, but as the overhead is even slightly lower than the cost of
running the task, it's a win to get some other thread that would otherwise
be idle to do it "for free" while you only pay the fixed overhead cost.

>
>

-- 
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862