Re: [Algorithms] General purpose task parallel threading approach

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Pushing work to multiple queues doesn't guarantee that they are also
executed simultaneously, unless if all your threads are idling, which isn't
very happy scenario. To me it sounds that you are in the process of
converting a single threaded engine to multi-threaded one where you start
from 100%/number_of_threads efficiency, in which case what you say makes
sense. I.e. you want the multi-threaded bits to be very local, where you
dispatch work to multiple threads and sync right after, and any work you can
parallelize is a win from the old design. However, if you have the advantage
of designing an engine from scratch for multi-threading, I don't see this as
a good approach.

Cheers, Jarkko

  _____  

From: Sebastian Sylvan [mailto:seb...@gm...] 
Sent: Wednesday, April 08, 2009 1:13 AM
To: Game Development Algorithms
Subject: Re: [Algorithms] General purpose task parallel threading approach

On Tue, Apr 7, 2009 at 10:30 PM, Jarkko Lempiainen <al...@gm...>
wrote:

Okay, yes you would evict the data from cache if a higher priority task
kicks in, assuming the data doesn't fit into the cache simultaneously before
returning to work on the original task. Now, if you think of a 8 core system
with job stealing, you must fit 8 times the amount of data into the cache
because you have 8 threads working on total of 8 unrelated tasks
simultaneously. This is much worse from data eviction point of view than
switching to a higher priority task and returning to work on the original
task after. Now think how this scales to 16, 32, 64 cores.

You're assuming that work stealing scheduling will always work on unrelated
tasks when this isn't the case. If you kick off a large batch of work it
will spread out over all workers, it'll just let whatever's already there
finish first - exploiting locality.

> If I can take something that takes 3ms to finish and get it done in 0.5ms
because I can use six threads instead of one then that's clearly a win.

No, it's not a win. See, all you care is that ALL the work for a frame is
completed as soon as possible, and not minimizing the time to execute a
single task. If you had 50% overhead like you say, then you would have 50%
less time to do any useful work. Now, you say it doesn't matter because you
have threads sitting idle anyway, but don't you think that you have some
serious work starvation issues in your engine instead which you try to patch
with this approach with the conclusion that it's not a lost resource?

If I have 50% overhead, that means that the other 50% is useful work that
would've had to run sequentially otherwise! I'm doing 50% MORE useful work
on my spare threads.

The purpose of splitting stuff up into parallel tasks is to use idle cores
to help speed up work. If you assume that you can keep those cores busy with
useful work using Some Other Technique then you're basically assuming that
there is no problem and that we therefore need no solution. That would be
great if it was the case, but it seems unrealistic.

The whole discussion is about "how do we use N threads to speed up a game
engine", saying that we shouldn't spawn lots of fine grained tasks because
the overhead in doing so would cost cycles that we could use for "useful
work" isn't really helping me, unless you can tell me what other technique
you plan to use to actually do this "useful work" that doesn't have the
overhead of just ramping up task parallelism so it scales to the number of
threads.

Of course you wouldn't trade 100% useful work for 80% useful work and 20%
overhead, that would be stupid. Nobody would write a task system with worker
threads on for a single-threaded machine. What I'm saying is that in the
not-so-far-off future when you start off with, say, 90% idle cycles and only
10% useful work, that 80/20 split is looking a lot more appealing.

-- 
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862