Re: [Algorithms] General purpose task parallel threading approach

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Sat, Apr 11, 2009 at 5:11 PM, Jarkko Lempiainen <al...@gm...>wrote:

>  The word “priority” is commonly used in task scheduling (e.g. in the GDC
> talk I pointed out earlier) to define preferred order of task execution. I’m
> sure I don’t have to explain basic scheduling terminology to you, right?
> Anyway, the system I described is generic and in fact less restrictive than
> what you have been talking about: You are not forced to statically define
> dependencies in the code flow (you can do it though), thus you can gain
> better throughput.
>
The confusion came from the fact that this discussion was about a general
task parallelism, not this restricted form you're talking about, and in that
context priority in the sense that you meant doesn't really make sense
(since dependencies would be enforced "outside" the priority system). I'm
just trying to explain why I misunderstood you the first time around. I know
what you meant now, and we can stop arguing semantics.

I tried to be very clear about what I meant by static/dynamic. By "static" I
meant that you need to know the dependencies before you start executing
tasks. Is this incorrect?
I understand that you can dynamically change the graph of tasks on a
frame-by-frame basis, but you still can't have arbitrary dynamic code flow
in the tasks themselves (because then it would be impossible to know the
dependencies ahead of time, which you need for your scheduler).

 But seems like you have firmly set your mind on specific way of doing
> multi-threading and adamantly believe it results in optimal throughput
> regardless what people tell you, so good luck with that.

We're talking about code here, not personalities.
I'm trying to point out that your system is, as a matter of mathematical
fact (not "adamant beliefs"), not a general task system, because it imposes
restrictions on the kinds of code you can run through it.
I'm not disregarding what people tell me, I'm happy to be shown, and have
repeatedly asked for, facts, evidence or even just well-reasoned
speculation. So if you think I'm wrong please explain where I'm going wrong
rather than bailing out with an ad-hominem attack.

Here, I'll even summarize my position for you so you can do it more easily:

1. Your system is an ahead-of-time scheduler, in that the dependencies of
your task system must be known before you start executing anything. It can
still change from frame-to-frame, but dependencies must be known before
execution.
2. By definition certain kinds of control flow can not be scheduled ahead of
time (i.e. it's mathematically impossible).
3. Therefore your system is not a general system, because it does not allow
dynamic control flow.
4. Dynamic control flow is very common in real code.
5. Your system must have "worst case" dependencies for each task, i.e. if a
task has an if-else clause that task must depend on the data used in both
branches. Since this is a tree, this gives you expontentially more
dependencies in your graph.
6. I haven't done the maths, but I *suspect* that (5) means that the
asymptotic time complexity w.r.t. the critical path is not constant, but at
least linear and probably exponential (it's a bit tricky to figure out
exactly, maybe I'm just thick).
7. In order to schedule dynamic code flow you need an on-line scheduler by
definition.
9. Work stealing is an on-line scheduler with provable efficiency (it's
asymptotically optimal).
10. (6) and (9) means you can't claim your ahead of time scheduler is better
than work-stealing in any absolute sense, though I'd be happy to see
benchmarks of specific instances where this is the case, but so far every
benchmark I've seen has had work stealing come out on top.
11. All these points taken together means that an on-line scheduler based on
work-stealing gives me:

Where am I going wrong here?

-- 
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862