Re: [Algorithms] General purpose task parallel threading approach

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I was talking about generic task system before. Dependencies impact to the
task priorities, because you want the longest chain of work (critical path)
to start as early as possible in the frame. Another thing which impacts to
the chain of the work is the length of each task in the chain, which is why
I profile each task in the task manager to know which chain is the longest.
Now, you are telling that you don't know about dependencies, etc.
beforehand, but in games we are in fortunate position that pretty much the
same call graph is ran from one frame to the next, so it's possible to
exploit frame-to-frame coherency to prioritize tasks and have pretty good
idea what's going to happen ahead of time. It's not perfect of course (i.e.
when a task is run for the first time there is no priority for it), but it's
close and definitely better than not prioritizing at all. So the task
prioritization is very dynamic and can adapt to the situation where
different tasks are pushed to the queue and where their lengths change from
frame to frame.

In my implementation I don't define dependencies beforehand like you seem to
assume you would have to do, but I define them at the call site, so there
isn't external maintenance to do but the usage is pretty much like calling a
function (1-2 lines of code), so it's quite convenient to use. One
motivation for the job queue I implemented was that it should be simple to
use so that you could easier take advantage of opportunities for
parallelism.

Cheers, Jarkko

  _____  

From: Sebastian Sylvan [mailto:seb...@gm...] 
Sent: Friday, April 10, 2009 7:54 PM
To: Game Development Algorithms
Subject: Re: [Algorithms] General purpose task parallel threading approach

On Fri, Apr 10, 2009 at 5:08 PM, Jarkko Lempiainen <al...@gm...>
wrote:

You can't statically schedule the work because the work changes dynamically
from on frame to the next. In one frame it may be best to start running
a->b->c->d as soon as possible, while sometimes you should start running z
early because it takes 10 * y, etc. It's the scheduler's job to figure out
the best order for optimal throughput and it would be silly to try to craft
the order directly to the code.

Not really, we do things like that all the time today - e.g. adding a
speculative prefetch because the profiler says that most of the time the
code will go down a certain branch and read a certain bit of data.

It may not be strictly possible to predict which order will be better in
every single circumstance, but you can look a the profiler output and
optimize for the most common case.

I see what you're saying now, though. You weren't actually talking about a
general task system before? So when you said "priorities" you really meant
ordering of a dependency graph that's known ahead of time (i.e. before you
start executing the first task), not a dynamic priority system for a general
task system. The use of the word "priority" threw me (as it's generally
referred to in the context of a user giving a certain task a priority and
having it execute first).

However, using a static dependency graph (not static as in "compile-time",
static as in "known before execution starts") is a bit like solving a
problem with a slow heap allocator by saying that you're not allowed to do
dynamic allocations. It doesn't actually solve the problem. I think we
really do want a task system that would "work like normal" except allow you
to run certain things in parallel, not something which has all sorts of
restrictions on the kind of code you're allowed to write.

Also, which scenario do you think is more likely:

A) The intricate one given before (which needs extremely specific
circumstances to be a problem).

B) Merely a couple of if statements or virtual function calls, giving you
dynamic dependencies. 

As an example of why B is bad, consider that e.g. a task may have to be
scheduled after some model has finished animating, even though it only needs
to look at the animation data in very rare circumstances (e.g. only a
specific subclass looks at it, or maybe it does hit detection and only needs
animation data if the conservative bounding sphere test succeeds, etc.). In
fact, this not only impacts throughput (a task that is actually ready to run
is blocked, while threads may be idle), but it may cause unnecessary
computatations (what if nothing ever needs that animation data - with a
static system you still need to compute it up front in case something does).

There are all sorts of cases where dependencies can only be known after
looking at dynamic conditions, meaning you have to start executing the task
before you know what other tasks you have dependencies on. Not supporting
that doesn't seem like it will scale in general, IMO.

Honestly dynamic call graphs are pretty common (if statements, virtual
function calls, loops, hell scripting etc. too) so I'm probably going to go
with saying that B is probably more common.

So given that static task systems have these issues, and that they are
highly restrictive in the kinds of computations you can parallelize, and
that they don't seem to actually perform better (the paper I linked earlier
showed work stealing matching hand-scheduled tasks), I think I know what
I'll go for.

-- 
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862