Re: [Algorithms] General purpose task parallel threading approach

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Sat, Apr 11, 2009 at 12:01 AM, Jon Watte <jw...@gm...> wrote:

> Sebastian Sylvan wrote:
> >
> >
> > On Fri, Apr 10, 2009 at 7:11 PM, Jon Watte <jw...@gm...
> > <mailto:jw...@gm...>> wrote:
> >
> >
> >     Of course not. A task doesn't get run until its dependencies are
> >     resolved.
> >
> >
> > Which is the whole problem, since it may need to run to even know what
> > its dependencies are.
>
> It's clear that we have totally different ideas about data and software
> architecture.
> Clearly you can know what the dependencies are, or you wouldn't be able
> to have those dependencies in the first place.

The issue is about knowing *ahead of time*.
Again, this goes back to the halting problem. It is mathematically
impossible to know the flow of a general function ahead of time - you have
to execute it to find out. If you think your system can do this then you've
either broken the universe, or you're wrong :-)

> If you have wildly different dependencies based on a switch(), then that
> is better expressed as a behavior component (that you can swap out)
> rather than a code switch(). (The "pure OO" movement claims that every
> switch can be turned into a virtual dispatch, but I wouldn't go that far)

But it's still impossible to know which behaviour to swap in until you've
started executing the tasks!
Look, all I'm saying is that your system isn't general - it only works for a
restricted form of computation where all dependencies can be known before
starting to execute the tasks. Is this incorrect?

I'm sure you understand that code in general doesn't work like that, even
this wouldn't be possible:

x = spawn(foo);
y = spawn(bar);
sync;
if ( x && y )
{
   z = spawn(baz);
}

You have to start executing tasks for the result of foo and bar to be known,
which means you can't know that baz is a dependency until after you've
already started executing the tasks, which means that the task running the
above code would need to block until all three sub-tasks have finished,
before it can run, even though one of them may only rarely be needed in
reality... But it gets even worse:

x = spawn(foo);
y = spawn(baz); // returns a task!
sync;
if ( x )
{
   z = spawn(y);
}

Now you can't even pessimistically schedule it, since you have no idea what
y might be until after you've started executing!

All I'm saying is: real code has dynamic control flow that's impossible to
predict ahead of time. Using your system you have to run that kind of stuff
serially, and I personally think that to scale to the kinds of thread counts
we will need to in the future, we can't afford to say "nah, we'll never need
to parallelize that" about anything.

Yeah, we use a very restricted task system right now for rendering, and in
this case the data flow is pretty simple, and the number of threads we use
for rendering is low, so it works out well. But once those big juicy low
hanging fruits are parallelized we'll still not be done, and then we would
be in trouble if the task system we have isn't general enough to handle the
remaining "nasty" code.

>
> > You specify dependencies, yes, but they may be dynamic
>
> Which is why the scheduler must be dynamic!
>

Precisely!

Subsystems can have dependencies, too. It's perfectly fine to specify
> that AI reasoning depends on collision detection, and collision
> detection depends on integration output (position) from the last frame.
> That way, you can re-use the collision output data for AI, rather than
> to have to re-run queries, too.

Precisely. As you say the AI reasoning must depend on collision detection,
even though it doesn't *actually* depend on it for most frames (and you
can't know in advance if it will). Meaning that it can't run even though
it's ready, it must wait for the worst-case set of dependencies to finish
first.

-- 
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862