Re: [Algorithms] General purpose task parallel threading approach

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Well, I see the good task prioritization as a dominating factor in task
scheduling, e.g. if you save even 1ms shared across /all/ the threads due to
better scheduling, you have already gained the overhead of executing ~5000
tasks in a frame. But you seem to disagree that task prioritization gains
you anything, am I right? And then there are of course issues with the cache
coherency and programmer productivity, so individual task overhead seems to
pale in comparison when speaking of such a small differences (in cycles) and
not something really worth optimizing further.

But yes, real-life benchmark would probably shine some light on this, but
then again you could always continue arguing that the benchmark favors
either approach, so I doubt a single benchmark proves anything either way. I
guess in the end it boils down to which ever approach is simplest to use and
understand by the majority and yet scales reasonably well thus gets adopted
as the best practice, even though it doesn't necessarily result in the
absolute best throughput.

Cheers, Jarkko

  _____  

From: Sebastian Sylvan [mailto:seb...@gm...] 
Sent: Sunday, April 12, 2009 5:07 AM
To: Game Development Algorithms
Subject: Re: [Algorithms] General purpose task parallel threading approach

On Sun, Apr 12, 2009 at 12:15 AM, Jarkko Lempiainen <al...@gm...>
wrote:

Yes, only tasks that have been previously executed are prioritized (if not,
the task has some default priority). The job queue also processes tasks in
groups (e.g. a group per function), so all tasks for a task group have the
same priority and are scheduled sequentially for execution (e.g. multiple
animate_character(character*) function tasks with different character* form
a task group). I profiled that the overhead of adding and executing a task
from the queue is around 500 cycles total on my test PC, so it's not bad,
but I haven't used the queue in a real game engine yet so I can't tell how
well (or badly) it works in real-life scenarios. Anyway, it should be quite
easy to change the scheduling strategy of the queue once it's being
integrated into an engine to see how it compares to work stealing for
example.

It would definitely be interesting to see some benchmark comparisons... The
problem I have with it a priori is that just plain old work stealing works
so surprisingly well (performance is very close to the theoretical maximum
for a given number of processors and a given length of critical path), that
there doesn't seem much room for improvement as far as throughput is
concerned.

You'd need to test both on the same code I guess because the cost of these
things vary, but if we guesstimate a bit then depending on the exact
architecture and calling convention (and the number of parameters that need
to be pushed on to the stack etc.) let's say cilk's "4x the cost of a
function call" equates to about 50 cycles giver or take, then you're adding
an order of magnitude extra overhead which seems quite steep to me.

-- 
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862