Thread: Re: [Algorithms] General purpose task parallel threading approach (Page 4)
Brought to you by:
vexxed72
|
From: Jarkko L. <al...@gm...> - 2009-04-09 22:51:51
|
> It's actually not vital for throughput, and I couldn't see anything in the presentation you linked saying it is. That's what "Minimizing the Critical Path" is about. Ignoring the issue doesn't mean it doesn't exist. Cheers, Jarkko _____ From: Sebastian Sylvan [mailto:seb...@gm...] Sent: Friday, April 10, 2009 1:13 AM To: Game Development Algorithms Subject: Re: [Algorithms] General purpose task parallel threading approach On Thu, Apr 9, 2009 at 10:27 PM, Jarkko Lempiainen <al...@gm...> wrote: Yes, appealing to the "big boys" is pretty weak argument since they have provided sub-optimal solutions before as well. In this case almost all of them have, at one point or another, tried the alternative and decided it was worth the effort to implement the much trickier solution. In fact the the presentation you linked was given by the very same organization (xna) who now recommend work stealing - so clearly they've given the global work queue an honest shot don't you think? Anyway, I was hoping to get some strong arguments either way, but I guess it's not very clear yet what solution is the best for multi-threading your engine. I read through the Gamefest slides and it was a bit surprising they didn't deal with prioritization at all, which is vital for throughput. It's actually not vital for throughput, and I couldn't see anything in the presentation you linked saying it is. Tasks are naturally ordered according to their dependencies due to the way they are spawned (you spawn tasks, then you sync - only then do you spawn any dependent tasks). With the cilk based model any task that is enqueued is by definition ready to run. There's in no need for priorities to optimize throughput. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Sebastian S. <seb...@gm...> - 2009-04-09 23:04:47
|
On Thu, Apr 9, 2009 at 11:52 PM, Jon Watte <jw...@gm...> wrote: > Sebastian Sylvan wrote: > > > > > > > > > > Anyway, I was hoping to get some strong arguments either way, but > > I guess it’s not very clear yet what solution is the best for > > multi-threading your engine. I read through the Gamefest slides > > and it was a bit surprising they didn’t deal with prioritization > > at all, which is vital for throughput. > > > > It's actually not vital for throughput, and I couldn't see anything in > > the presentation you linked saying it is. > > Tasks are naturally ordered according to their dependencies due to the > > way they are spawned (you spawn tasks, then you sync - only then do > > you spawn any dependent tasks). With the cilk based model any task > > that is enqueued is by definition ready to run. There's in no need for > > priorities to optimize throughput. > > > > Actually, it is. Consider a case where you have some independent tasks, > and some serially dependent tasks. Suppose you have 4 independent tasks > xyzw, 4 linearly dependent tasks abcd, and two cores to schedule across. > Here are two possible work orderings: > > xy > zw > a > b > c > d > > ax > by > cz > dw > > Clearly, the second case is much preferred. However, without > prioritization, you won't get that outcome. Actually you could, since you can just spawn the tasks in the order you want to run them. Like I believe I said very early on, the only reason you would want to reorder tasks is because you may want to start a task that spawns many other threads early - but you can still do that without having a priority system since the order in which tasks are spawned is deterministic. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Sebastian S. <seb...@gm...> - 2009-04-09 23:38:56
|
On Fri, Apr 10, 2009 at 12:04 AM, Sebastian Sylvan < seb...@gm...> wrote: > > > On Thu, Apr 9, 2009 at 11:52 PM, Jon Watte <jw...@gm...> wrote: > >> Sebastian Sylvan wrote: >> > >> > >> > >> > >> > Anyway, I was hoping to get some strong arguments either way, but >> > I guess it’s not very clear yet what solution is the best for >> > multi-threading your engine. I read through the Gamefest slides >> > and it was a bit surprising they didn’t deal with prioritization >> > at all, which is vital for throughput. >> > >> > It's actually not vital for throughput, and I couldn't see anything in >> > the presentation you linked saying it is. >> > Tasks are naturally ordered according to their dependencies due to the >> > way they are spawned (you spawn tasks, then you sync - only then do >> > you spawn any dependent tasks). With the cilk based model any task >> > that is enqueued is by definition ready to run. There's in no need for >> > priorities to optimize throughput. >> > >> >> Actually, it is. Consider a case where you have some independent tasks, >> and some serially dependent tasks. Suppose you have 4 independent tasks >> xyzw, 4 linearly dependent tasks abcd, and two cores to schedule across. >> Here are two possible work orderings: >> >> xy >> zw >> a >> b >> c >> d >> >> ax >> by >> cz >> dw >> >> Clearly, the second case is much preferred. However, without >> prioritization, you won't get that outcome. > > > Actually you could, since you can just spawn the tasks in the order you > want to run them. Like I believe I said very early on, the only reason you > would want to reorder tasks is because you may want to start a task that > spawns many other threads early - but you can still do that without having a > priority system since the order in which tasks are spawned is deterministic. > > In fact, in your particular case this isn't only possible without prioritization, but it's what happens if you pay no attention what so ever and just write things naively, expressing parallelism where it is. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Jon W. <jw...@gm...> - 2009-04-10 00:46:41
|
Sebastian Sylvan wrote: > Actually you could, since you can just spawn the tasks in the order > you want to run them. So the totally independent tasks xyzw, and the totally dependent tasks abcd need to be spawned in interspersed order? Across all executing threads? I shudder to think of the horribly convoluted code that would require -- it's pretty much impossible. Note that I view tasks as work items -- not as something that gets a regular tick each frame, but actually something which has an allocation, lifetime, and death (although various underlying data is likely pooled). Tasks come from things like detecting collision, or exploding grenades, or passing triggers, or what have you -- they are not static. > you can still do that without having a priority system since the order > in which tasks are spawned is deterministic. Only in a world where the user is deterministic. Sincerely, jw |
|
From: Nicholas \Indy\ R. <ar...@gm...> - 2009-04-10 01:03:43
|
On Thu, Apr 9, 2009 at 5:46 PM, Jon Watte <jw...@gm...> wrote: > Only in a world where the user is deterministic. In my experience the order of events within a single thread is set and doesn't change, even though depending on the user certain tasks won't need to be spawned, It's trivial to setup the order of tasks *within* a single frame to have a static order, whether that is optimal or not is another matter altogether. Nicholas "Indy" Ray |
|
From: Sebastian S. <seb...@gm...> - 2009-04-09 23:25:31
|
On Thu, Apr 9, 2009 at 11:51 PM, Jarkko Lempiainen <al...@gm...>wrote: > > It's actually not vital for throughput, and I couldn't see anything in > the presentation you linked saying it is. > > > > That’s what “Minimizing the Critical Path” is about. Ignoring the issue > doesn’t mean it doesn’t exist. > > That section only seems to talk about optimizing scheduling based on static (in the sense that they're known ahead of time - not necessarily at compile time) dependencies, not about the necessity to support priorities. And of course, general code doesn't have a static call graph, so it's even impossible to do this analysis unless your code happens to be simple enough (see: the halting problem). -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Sebastian S. <seb...@gm...> - 2009-04-10 00:04:33
|
On Fri, Apr 10, 2009 at 12:25 AM, Sebastian Sylvan < seb...@gm...> wrote: > > > On Thu, Apr 9, 2009 at 11:51 PM, Jarkko Lempiainen <al...@gm...>wrote: > >> > It's actually not vital for throughput, and I couldn't see anything in >> the presentation you linked saying it is. >> >> >> >> That’s what “Minimizing the Critical Path” is about. Ignoring the issue >> doesn’t mean it doesn’t exist. >> >> > That section only seems to talk about optimizing scheduling based on static > (in the sense that they're known ahead of time - not necessarily at compile > time) dependencies, not about the necessity to support priorities. > > And of course, general code doesn't have a static call graph, so it's even > impossible to do this analysis unless your code happens to be simple enough > (see: the halting problem). > Also, it's worth noting that work stealing is *provably* efficient. In other words the way the scheduling works gives asymptotically optimal throughput. So claiming that either ahead-of-time prioritization, or a runtime priority system is required for efficient throughput is mathematically incorrect. And given that the absence of them give a far more flexible programming model, and real-world speedups in practice, I'd say the jury's in on this one... -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Nicholas \Indy\ R. <ar...@gm...> - 2009-04-09 23:49:05
|
On Thu, Apr 9, 2009 at 4:25 PM, Sebastian Sylvan <seb...@gm...> wrote: > And of course, general code doesn't have a static call graph, so it's even > impossible to do this analysis unless your code happens to be simple enough > (see: the halting problem). I've found the barrier for "simple enough" to actually be rather high, especially when writing special case analysis (dehydra has been used in mozilla to build simple call graphs https://developer.mozilla.org/en/Dehydra). But often it turns out that building a call graph for optimization during a profile like stage is very effective, It's easy to get real call data, and you don't have to worry about the halting problem. Also, a description of the halting problem with some humor value: http://www.lel.ed.ac.uk/~gpullum/loopsnoop.html Nicholas "Indy" Ray |
|
From: Sebastian S. <seb...@gm...> - 2009-04-09 23:59:49
|
On Fri, Apr 10, 2009 at 12:49 AM, Nicholas "Indy" Ray <ar...@gm...>wrote: > On Thu, Apr 9, 2009 at 4:25 PM, Sebastian Sylvan > <seb...@gm...> wrote: > > And of course, general code doesn't have a static call graph, so it's > even > > impossible to do this analysis unless your code happens to be simple > enough > > (see: the halting problem). > > I've found the barrier for "simple enough" to actually be rather high, > especially when writing special case analysis (dehydra has been used > in mozilla to build simple call graphs > https://developer.mozilla.org/en/Dehydra). But often it turns out that > building a call graph for optimization during a profile like stage is > very effective, It's easy to get real call data, and you don't have to > worry about the halting problem. > Yes, profile guided optimization is definitely promising.It would seem to me that the most gainful use would be not to reorder task execution, but to determine which tasks would be more efficient to execute inline. In other words, the programmer would call "spawn" *everywhere* with no regard to the overhead, and the profiler would figure out which one of those to ignore. I seem to remember some research out of MSR doing this to Haskell programs and getting good speedups (but of course Haskell programs don't need manual markup of potential parallelism since they're purely functional - the profiler would insert the parallism "sparks" for you in the correct places). -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Jon W. <jw...@gm...> - 2009-04-10 00:48:33
|
Sebastian Sylvan wrote: > > Here are two possible work orderings: > > xy > zw > a > b > c > d > > ax > by > cz > dw > > > In fact, in your particular case this isn't only possible without > prioritization, but it's what happens if you pay no attention what so > ever and just write things naively, expressing parallelism where it is. > There is no guarantee of that. The scheduler, expressing "parallelism where it is" could easily choose the xyzw first order, rather than trying to run a in parallel with x. There is nothing in the knowledge of the scheduler you propose to guarantee the second (desirable) outcome, except perhaps wishful thinking on part of the implementer. Sincerely, jw |
|
From: Sebastian S. <seb...@gm...> - 2009-04-10 01:00:15
|
On Fri, Apr 10, 2009 at 1:48 AM, Jon Watte <jw...@gm...> wrote: > Sebastian Sylvan wrote: > > > > Here are two possible work orderings: > > > > xy > > zw > > a > > b > > c > > d > > > > ax > > by > > cz > > dw > > > > > > > In fact, in your particular case this isn't only possible without > > prioritization, but it's what happens if you pay no attention what so > > ever and just write things naively, expressing parallelism where it is. > > > > > There is no guarantee of that. The scheduler, expressing "parallelism > where it is" could easily choose the xyzw first order, rather than > trying to run a in parallel with x. There is nothing in the knowledge of > the scheduler you propose to guarantee the second (desirable) outcome, > except perhaps wishful thinking on part of the implementer. > In this case, with the scheduling policy I propose (standard work-stealing) I don't see how it could possibly do anything different? You put two tasks (the xyzw one and abcd one) on whichever queue you start on, the other queue will sit idle just waiting for something to pop up somewhere and grab the first one to go on the queue. Then the second one goes on the queue and then there's the final sync which causes the originating thread to start executing. Regardless of the order you spawn the task the other thread will take the other one. There is no opportunity for the xyzw tasks to "skip" past the abcd ones since you only steal when you've run out (so even if the other thread takes the xyzw ones it will put the child-tasks on *its* queue, and the original thread will go ahead and do abcd before trying to steal). So yeah, looks pretty guaranteed. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Jon W. <jw...@gm...> - 2009-04-10 02:30:21
|
Sebastian Sylvan wrote: > You put two tasks (the xyzw one and abcd one) on whichever queue you > start on, the other queue will sit idle just waiting for something to > pop up somewhere and grab the first one to go on the queue. Then the > second one goes on the queue and then there's the final sync which > causes the originating thread to start executing. Regardless of the > order you spawn the task the other thread will take the other one. > There is no opportunity for the xyzw tasks to "skip" past the abcd > ones since you only steal when you've run out (so even if the other > thread takes the xyzw ones it will put the child-tasks on *its* queue, > and the original thread will go ahead and do abcd before trying to steal). There are eight tasks. x, y, z and w are part of the "independent" work group. a, b, c and d are part of the "sequentially dependent" group. That was part of the original problem statement. As you should clearly see in the problem statement, if the tasks xyzw are enqueued before abcd (all on the same thread, then), you will not get optimal performance, because the stealer will steal y after the original thread starts working on x. Sincerely, jw |
|
From: Sebastian S. <seb...@gm...> - 2009-04-10 01:26:37
|
On Fri, Apr 10, 2009 at 1:48 AM, Jon Watte <jw...@gm...> wrote: > Sebastian Sylvan wrote: > > > > Here are two possible work orderings: > > > > xy > > zw > > a > > b > > c > > d > > > > ax > > by > > cz > > dw > > > > > > > In fact, in your particular case this isn't only possible without > > prioritization, but it's what happens if you pay no attention what so > > ever and just write things naively, expressing parallelism where it is. > > > > > There is no guarantee of that. The scheduler, expressing "parallelism > where it is" could easily choose the xyzw first order, rather than > trying to run a in parallel with x. There is nothing in the knowledge of > the scheduler you propose to guarantee the second (desirable) outcome, > except perhaps wishful thinking on part of the implementer. > I think they key notion you may be misunderstanding about work stealing is that the stealing happens from the opposite end of the queue versus where the owning thread executes from. If stealing happened from the "execution end" then yes, you could get this weird interleaving, but that's not the case. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Jon W. <jw...@gm...> - 2009-04-10 02:32:38
|
Nicholas "Indy" Ray wrote: > In my experience the order of events within a single thread is set and > doesn't change, even though depending on the user certain tasks won't > need to be spawned, It's trivial to setup the order of tasks *within* > a single frame to have a static order, whether that is optimal or not > is another matter altogether. > Only if you legislate that reactions to events can only happen on the next frame. Else, you may have an event of type A that triggers an event of type B, and another event of type B that triggers an event of type A, and you can't get a static order that solves both. For example: grenades exploding may put things on fire, but things burning may also explode grenades. Only in a system where you tie-break by introducing latency would you be able to use a static ordering. Meanwhile, in a system where work items are actual individual items (rather than static "pulsed" state machines or whatever), either case will be resolved within the same frame. Sincerely, jw |
|
From: Nicholas \Indy\ R. <ar...@gm...> - 2009-04-10 02:49:55
|
On Thu, Apr 9, 2009 at 7:32 PM, Jon Watte <jw...@gm...> wrote: > Only if you legislate that reactions to events can only happen on the > next frame. Else, you may have an event of type A that triggers an event > of type B, and another event of type B that triggers an event of type A, > and you can't get a static order that solves both. > For example: grenades exploding may put things on fire, but things > burning may also explode grenades. Only in a system where you tie-break > by introducing latency would you be able to use a static ordering. > Meanwhile, in a system where work items are actual individual items > (rather than static "pulsed" state machines or whatever), either case > will be resolved within the same frame. I understand it's true that situations arise where it is easy to loose said order of events. But in my experience most of such events are put off until the next frame anyways, even in single threaded engines. Indy |
|
From: Jon W. <jw...@gm...> - 2009-04-10 02:35:42
|
Sebastian Sylvan wrote: > I think they key notion you may be misunderstanding about work > stealing is that the stealing happens from the opposite end of the > queue versus where the owning thread executes from. If stealing > happened from the "execution end" then yes, you could get this weird > interleaving, but that's not the case. Steal order doesn't matter, because in that case you may easily have two threads adding tasks: Adding order: 1 2 x y z w a b c d Now, stealing, no matter from which order you steal, will not lead to optimal scheduling. You do need to sort by dependencies, and to avoid having to do optimization on the full spanning graph, simply choosing the task with the deepest*widest dependency chain first will usually work out well enough. Sincerely, jw |
|
From: Jon W. <jw...@gm...> - 2009-04-10 23:01:51
|
Sebastian Sylvan wrote: > > > On Fri, Apr 10, 2009 at 7:11 PM, Jon Watte <jw...@gm... > <mailto:jw...@gm...>> wrote: > > > Of course not. A task doesn't get run until its dependencies are > resolved. > > > Which is the whole problem, since it may need to run to even know what > its dependencies are. It's clear that we have totally different ideas about data and software architecture. Clearly you can know what the dependencies are, or you wouldn't be able to have those dependencies in the first place. > > Same thing if you call a virtual function - who knows what code will > run as a result of that, and what data dependencies that will cause? > To answer your rethorical question: Our data path system knows. If you have wildly different dependencies based on a switch(), then that is better expressed as a behavior component (that you can swap out) rather than a code switch(). (The "pure OO" movement claims that every switch can be turned into a virtual dispatch, but I wouldn't go that far) > Yours doesn't seem to be. It may work great for your particular > specialized conditions (heck, we use something similar for our > rendering tasks!) Our data dependency system is very similar to that of dependency properties in WPF, or properties in the Spring web framework. It's very general, and used pervasively through the entire system. > IMO we really will need a general system in the future to be able to > kick off enough tasks to keep 64 threads busy for an entire frame. > I believe you need to know, and be able to express, what your data is, where it is coming from, and how it is processed, to be able to reason about it properly. Accidental dependencies that aren't injected explicitly is pretty much where all non-trivial coupling and performance bugs come from, so engineering those out is a big win, even before you start looking at threading a component system. > You specify dependencies, yes, but they may be dynamic Which is why the scheduler must be dynamic! > For example, in some AI code you may want to do a parallel loop > through all enemies within a certain radius to score them based on the > danger they pose to you - you can't really know before you start > executing what enemies you're going to get back from that range query, > and therefore which objects you're going to have a dependency on - so > you need to start executing the task before you know which > dependencies it has. Subsystems can have dependencies, too. It's perfectly fine to specify that AI reasoning depends on collision detection, and collision detection depends on integration output (position) from the last frame. That way, you can re-use the collision output data for AI, rather than to have to re-run queries, too. Sincerely, jw |
|
From: Sebastian S. <seb...@gm...> - 2009-04-10 23:35:34
|
On Sat, Apr 11, 2009 at 12:01 AM, Jon Watte <jw...@gm...> wrote:
> Sebastian Sylvan wrote:
> >
> >
> > On Fri, Apr 10, 2009 at 7:11 PM, Jon Watte <jw...@gm...
> > <mailto:jw...@gm...>> wrote:
> >
> >
> > Of course not. A task doesn't get run until its dependencies are
> > resolved.
> >
> >
> > Which is the whole problem, since it may need to run to even know what
> > its dependencies are.
>
> It's clear that we have totally different ideas about data and software
> architecture.
> Clearly you can know what the dependencies are, or you wouldn't be able
> to have those dependencies in the first place.
The issue is about knowing *ahead of time*.
Again, this goes back to the halting problem. It is mathematically
impossible to know the flow of a general function ahead of time - you have
to execute it to find out. If you think your system can do this then you've
either broken the universe, or you're wrong :-)
> If you have wildly different dependencies based on a switch(), then that
> is better expressed as a behavior component (that you can swap out)
> rather than a code switch(). (The "pure OO" movement claims that every
> switch can be turned into a virtual dispatch, but I wouldn't go that far)
But it's still impossible to know which behaviour to swap in until you've
started executing the tasks!
Look, all I'm saying is that your system isn't general - it only works for a
restricted form of computation where all dependencies can be known before
starting to execute the tasks. Is this incorrect?
I'm sure you understand that code in general doesn't work like that, even
this wouldn't be possible:
x = spawn(foo);
y = spawn(bar);
sync;
if ( x && y )
{
z = spawn(baz);
}
You have to start executing tasks for the result of foo and bar to be known,
which means you can't know that baz is a dependency until after you've
already started executing the tasks, which means that the task running the
above code would need to block until all three sub-tasks have finished,
before it can run, even though one of them may only rarely be needed in
reality... But it gets even worse:
x = spawn(foo);
y = spawn(baz); // returns a task!
sync;
if ( x )
{
z = spawn(y);
}
Now you can't even pessimistically schedule it, since you have no idea what
y might be until after you've started executing!
All I'm saying is: real code has dynamic control flow that's impossible to
predict ahead of time. Using your system you have to run that kind of stuff
serially, and I personally think that to scale to the kinds of thread counts
we will need to in the future, we can't afford to say "nah, we'll never need
to parallelize that" about anything.
Yeah, we use a very restricted task system right now for rendering, and in
this case the data flow is pretty simple, and the number of threads we use
for rendering is low, so it works out well. But once those big juicy low
hanging fruits are parallelized we'll still not be done, and then we would
be in trouble if the task system we have isn't general enough to handle the
remaining "nasty" code.
>
> > You specify dependencies, yes, but they may be dynamic
>
> Which is why the scheduler must be dynamic!
>
Precisely!
Subsystems can have dependencies, too. It's perfectly fine to specify
> that AI reasoning depends on collision detection, and collision
> detection depends on integration output (position) from the last frame.
> That way, you can re-use the collision output data for AI, rather than
> to have to re-run queries, too.
Precisely. As you say the AI reasoning must depend on collision detection,
even though it doesn't *actually* depend on it for most frames (and you
can't know in advance if it will). Meaning that it can't run even though
it's ready, it must wait for the worst-case set of dependencies to finish
first.
--
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862
|
|
From: Jon W. <jw...@gm...> - 2009-04-11 01:50:53
|
Sebastian Sylvan wrote:
> Again, this goes back to the halting problem. It is mathematically
> impossible to know the flow of a general function ahead of time - you
> have to execute it to find out. If you think your system can do this
> then you've either broken the universe, or you're wrong :-)
Luckily, I don't ship general functions; I ship specific functions.
>
>
> If you have wildly different dependencies based on a switch(),
> then that
> is better expressed as a behavior component (that you can swap out)
> rather than a code switch(). (The "pure OO" movement claims that every
> switch can be turned into a virtual dispatch, but I wouldn't go
> that far)
>
>
> But it's still impossible to know which behaviour to swap in until
> you've started executing the tasks!
If you swap in a new behavior, that will take effect the next frame.
> Look, all I'm saying is that your system isn't general - it only works
> for a restricted form of computation where all dependencies can be
> known before starting to execute the tasks. Is this incorrect?
>
Luckily, all games, as well as all business software, fall into that
"restricted" form.
> I'm sure you understand that code in general doesn't work like that,
> even this wouldn't be possible:
>
> x = spawn(foo);
> y = spawn(bar);
> sync;
> if ( x && y )
> {
> z = spawn(baz);
> }
>
So I'm saying you, as the component writer, don't need to worry about
spawning at all. Tasks are created by the system, to resolve the
semantics as described by the system. Having the programmer create
deadlocks and races and global dependencies willy-nilly sounds to me a
lot like writing your code in assembly. Very last-century, and not where
the industry is going. I believe the computer can actually make more
sense, and schedule your dependencies better, than you can.
> > You specify dependencies, yes, but they may be dynamic
>
> Which is why the scheduler must be dynamic!
>
>
> Precisely!
>
At least we agree on something :-)
> Subsystems can have dependencies, too. It's perfectly fine to specify
> that AI reasoning depends on collision detection, and collision
> detection depends on integration output (position) from the last
> frame.
> That way, you can re-use the collision output data for AI, rather than
> to have to re-run queries, too.
>
>
> Precisely. As you say the AI reasoning must depend on collision
> detection, even though it doesn't *actually* depend on it for most
> frames (and you can't know in advance if it will). Meaning that it
> can't run even though it's ready, it must wait for the worst-case set
> of dependencies to finish first.
If you're not doing at least some AI reasoning each frame, then you're
not using your cores right :-)
Seriously, though, I'm OK with saying that a task that depends on the
output of collision testing is actually forced to run after collision
testing. Tasks that depend only on, say, the game being in "paused"
mode, or depend on the player input controller being pushed forward, or
whatever, can run independent of that.
I think the discussion has come to an end, though. You want to enable
old-school low-level code to run as well as it can under the explicit
control of the programmer, and you're willing to pay the cost of
allocating and switching stacks at runtime to do it. I want to move
threading decisions away from the programmer, and let the software do
that, based on intentions and dependencies expressed by the programmer.
Two different problems lead to two different solutions. I don't think
I'll convince you that your problem is "too old-school to solve," and I
don't think you'll convince me that my problem is "too high-level to be
useful."
Sincerely,
jw
|
|
From: Sebastian S. <seb...@gm...> - 2009-04-11 02:39:42
|
On Sat, Apr 11, 2009 at 2:44 AM, Jon Watte <jw...@gm...> wrote: > So I'm saying you, as the component writer, don't need to worry about > spawning at all. Tasks are created by the system, to resolve the > semantics as described by the system. I do need to worry about specifying what's parallel and what isn't (unless I'm in a language where this can be automatically deduced - which C/C++ isn't), and if since I can't do that in a dynamic way your system isn't flexible enough to run generic code. > Having the programmer create > deadlocks and races and global dependencies willy-nilly sounds to me a > lot like writing your code in assembly. Very last-century, and not where > the industry is going. I believe the computer can actually make more > sense, and schedule your dependencies better, than you can. I don't believe the computer can violate the laws of mathematics to do it, which means that in order for your ahead-of-time scheduler to work it *must* severely restrict the kinds of control flow I can have. > > Which is why the scheduler must be dynamic! > > > > > > Precisely! > > > > At least we agree on something :-) Yes, unfortunately I can't seem to explain adequately to you that a system that requires all dependencies to be known up front is not actually dynamic in any meaningful sense of the word. > If you're not doing at least some AI reasoning each frame, then you're > not using your cores right :-) I'm doing AI reasoning every frame, but every frame each task will only go through exactly one trace through my AI processing routine (not both branches of an if-else for example), which means that there might be many *potential* dependencies, that are not *actual* dependencies. I may cast a ray to an off-screen object to see if my laser hits, but if I miss the bounding sphere I don't actually need to spawn a task for computing the animation of the object - and even if that animation is needed for something else and will be computed anyway, at least I won't be blocked for it to finish, improving throughput. > . I don't think > I'll convince you that your problem is "too old-school to solve," and I > don't think you'll convince me that my problem is "too high-level to be > useful." My criticism of your approach is hardly that it's too high level. Quite the opposite! I'm saying that it is, as a matter of mathematical fact, not flexible enough to run code with dynamic control flow, because by definition you can not know dependencies ahead of time for such code. Dynamic control flow *requires* an on-line scheduler - yours isn't. Restricting the use of dynamic control flow is the opposite of "high level" as it restricts the use of fairly ubiquitous things like virtual functions (when you call a virtual function you can't know what dependencies that call will have ahead of time as virtual functions are dynamically dispatched - you just have to make the call and see where you end up). That might be fine for some parts of your program, however, saying "I'm fine with restricting expressiveness/abstraction for code that needs to run in parallel" might be okay today, but in the future that will be synonymous with "I'm fine with restricting expressiveness/abstraction everywhere", which IMO would be a big mistake. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Jon W. <jw...@gm...> - 2009-04-12 02:15:38
|
Sebastian Sylvan wrote: > > So I'm saying you, as the component writer, don't need to worry about > spawning at all. Tasks are created by the system, to resolve the > semantics as described by the system. > > > I do need to worry about specifying what's parallel and what isn't > (unless I'm in a language where this can be automatically deduced - > which C/C++ isn't), and if since I can't do that in a dynamic way your > system isn't flexible enough to run generic code. Luckily, I'm not running generic code; I'm running code that interfaces with a specific operating system, a specific compiler, and a specific runtime framework. > > Yes, unfortunately I can't seem to explain adequately to you that a > system that requires all dependencies to be known up front is not > actually dynamic in any meaningful sense of the word. And I can't explain to you how that's not actually true, and how the limitations that do exist actually are much less evil than the problems of trying to solve the "generic" problem. Although I gave it a pretty good shot, I think. > My criticism of your approach is hardly that it's too high level. > Quite the opposite! I'm saying that it is, as a matter of mathematical > fact, not flexible enough to run code with dynamic control flow, > because by definition you can not know Yet it is doing it, daily. Note that most flow control comes from expression evaluation and polymorphism, rather than old-school if-then-else and switch statements which you seem to be worried about. It's simply a different style of building software, using a different kind of paradigm than you are using. Personally, I think the old-school paradigm is unfit for heavy multi-core execution, and using explicit, data-centric components are the way of the future. You're of course free to disagree, because the future isn't actually here yet. > dependencies ahead of time for such code. Dynamic control flow > *requires* an on-line scheduler - yours isn't. My specific proposal was that each time a core is free, it picks the highest priority task available in the queue. Priorities are ideally updated each time a new task is added to the queue (but that can be loosened up for efficiency if it turns out to be a bottleneck). I think that's an on-line scheduler. If you recall, my argument was simply that a naive work-stealing scheduler is not capable of making decisions made on global knowledge (such as dependency depth), and thus will often (statistically) make poor choices. > Restricting the use of dynamic control flow is the opposite of "high > level" as it restricts the use of fairly ubiquitous things like > virtual functions (when you call a virtual function you can't know > what dependencies that call will have ahead of time as virtual > functions are dynamically dispatched - you just have to make the call > and see where you end up). > Go write a real application using WPF dependency properties, and then tell me that that's a "low level" "restricted" programming model. (I pick WPF because that's the closest available, well-documented API that has many of the same affordances as our framework) I think you simply are mired in the old, low-level ways, and don't understand the system I'm using. (The arguments you make are very similar to the arguments that assembly programmers made when FORTRAN and C showed up, IMO). Heavy multi-core threading is a new kind of execution environment, and it's unlikely that the old style of doing things will be the best match for that environment. Just like the introduction of deep caches made instruction counting a thing of the past. Sincerely, jw |
|
From: Sebastian S. <seb...@gm...> - 2009-04-12 15:01:04
|
On Sun, Apr 12, 2009 at 3:15 AM, Jon Watte <jw...@gm...> wrote: > Luckily, I'm not running generic code; I'm running code that interfaces > with a specific operating system, a specific compiler, and a specific > runtime framework. Exactly! And if that specific runtime framework enforces specific restrictions, then that means your specific code has to be written under those specific restrictions, which means your runtime framework is not general. > > Yes, unfortunately I can't seem to explain adequately to you that a > > system that requires all dependencies to be known up front is not > > actually dynamic in any meaningful sense of the word. > > And I can't explain to you how that's not actually true It's a mathetmatical truth that any system that requires dependencies to be known up front can not also be dynamic. You have said that your system requires this, I quote: "A task doesn't get run until its dependencies are resolved. [...] In our system, we have explicit data dependency metadata [...], so this knowledge is something we have up front." and therefore your system can not be dynamic. > > My criticism of your approach is hardly that it's too high level. > > Quite the opposite! I'm saying that it is, as a matter of mathematical > > fact, not flexible enough to run code with dynamic control flow, > > because by definition you can not know > > Yet it is doing it, daily. Note that most flow control comes from > expression evaluation and polymorphism, rather than old-school > if-then-else and switch statements which you seem to be worried about. If you read more carefully I've given you two examples of how tasks using polymorphism can not be scheduled ahead of time either. Any kind of dynamic control flow needs an on-line scheduler. > > dependencies ahead of time for such code. Dynamic control flow > > *requires* an on-line scheduler - yours isn't. > > My specific proposal was that each time a core is free, it picks the > highest priority task available in the queue. Priorities are ideally > updated each time a new task is added to the queue (but that can be > loosened up for efficiency if it turns out to be a bottleneck). I think > that's an on-line scheduler. Well depending on how you propose to know the priorities, this may not be what you said before. My whole argument has been (and I've stated this very clearly in every messag) that an ahead-of-time scheduler can not schedule dynamic code. If you weren't talking about an ahead of time scheduler you could've said this on any number of occasions. You said (see your quote above) that you *were* doing ahead-of-time scheduling before, if you've now changed your mind then we clearly aren't in disagreement any more (on this particular issue anyway). Go write a real application using WPF dependency properties, and then tell me that that's a "low level" "restricted" programming model. If WPF requires all dependencies to be known up front then it is restricted. That's probably fine because it has a very specific usage area. Like I've said before, restricted doesn't mean bad, it just means restricted. I re-wrote our task scheduler for our rendering system not four months ago, and guess what? It's an ahead-of-time scheduler. Saying that ahead-of-time scheduling can only handle restricted scenarios is not the same as calling anyone who uses it stupid (because then I'd be calling myself stupid!), it's just an honest discussion about pros and cons without getting defensive just because I happen to use this very system myself. All I'm doing is pointing out the mathematical impossibility of both knowing all dependencies up front, *and* allowing dynamic control flow (including polymorphism). -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Jon W. <jw...@gm...> - 2009-04-13 03:17:48
|
Sebastian Sylvan wrote: > Exactly! And if that specific runtime framework enforces specific > restrictions, then that means your specific code has to be written > under those specific restrictions, which means your runtime framework > is not general. > I used to be in the OS business for many years. However, Microsoft ate us on the high end, and Linux ate us on the low end -- there's not room for much else these days, IMO. > > It's a mathetmatical truth that any system that requires dependencies > to be known up front can not also be dynamic. > You have said that your system requires this, I quote: > "A task doesn't get run until its dependencies are resolved. > [...] > In our system, we have explicit data dependency metadata > [...], so this knowledge is something we > have up front." > Up front at the start of a frame, not up front at the start of the program. A dependency is simply expressed as a DOM-style path, much like XPath, JavaScript and WPF. All components and functions of the path are "live," and any change is distributed to dependents. You can re-assign these paths if you want; such a re-assignment will take effect the next frame. > > If you read more carefully I've given you two examples of how tasks > using polymorphism can not be scheduled ahead of time either. Any kind > of dynamic control flow needs an on-line scheduler. Sure it can! You just have to say that polymorphism isn't allowed to cause new tasks to get spawned synchronously. I think that's a very simple rule to live by, and it makes smearing the workload across cores much easier. Not to mention it causes fewer bugs. > > Well depending on how you propose to know the priorities, this may not > be what you said before. Well, it seems that you're talking to several people at once, so the "you" you've been using has started to look a little diffuse to me. > My whole argument has been (and I've stated this very clearly in every > messag) that an ahead-of-time scheduler can not schedule dynamic code. > If you weren't talking about an ahead of time scheduler you could've > said this on any number of The scheduler I propose is not ahead of time, but it does know (and update) the dependencies of tasks, including tasks that get added to the queue during a frame. If you want to buckle down to ahead-of-time, there's actually two kinds of ahead-of-time: one that is truly static (such as those proposed by people who set up "tasks" at program start, and then run them statically), and one that is static per frame, but is dynamic across frames. I think the latter may also be useful for games, but that's not what I'm proposing. > > If WPF requires all dependencies to be known up front then it is > restricted. You know what, my car is restricted. It can't go faster than 85 miles per hour. It can only run on paved roads. Heck, I can't even drive to another state from where I live without the infrastructure of a gas station. Still, it turns out that those restrictions allow me to buy a car that is quite affordable, comfortable, and safe, and actually solves the problem I need it to solve (get from home to work and back, every day, without breaking). I'll let you figure out how to map this analogy to writing schedulers for games :-) > I re-wrote our task scheduler for our rendering system not four months > ago, and guess what? It's an ahead-of-time scheduler. > All I'm doing is pointing out the mathematical impossibility of both > knowing all dependencies up front, *and* allowing dynamic control flow > (including polymorphism). And I'm saying that the system I'm describing knows all the dependencies when a task is inserted, but because tasks can be inserted during processing, that means it's not "ahead of time" in the sense of start-of-frame or start-or-program. That being said, almost all new tasks are created because new objects are created in the system, and object creation (or, rather, injection) generally happens at a defined time. Seems dynamic enough for anything we do (and we do a whole-Earth virtual world with user generated content, which is pretty much the most dynamic case you can think of while still calling it a game). Again: Spawning threads/tasks willy-nilly, especially based on hidden switch/if statements in code, and then (this is the worst part) synchronously waiting for the result, is a Bad Idea if you want to spread across many, many cores. If you consider caches local to cores, then cores are actually similar to NUMA machine architectures, with all that implies. If you want to be totally general (in the sense of a current general-purpose UNIX-like operating system, say), then you will not scale well across large numbers of cores. A different programming paradigm is needed. What I'm describing is one that I think will work out great. Sincerely, jw |
|
From: Sebastian S. <seb...@gm...> - 2009-04-13 10:16:32
|
On Mon, Apr 13, 2009 at 4:17 AM, Jon Watte <jw...@gm...> wrote: > > > Up front at the start of a frame, not up front at the start of the program. Of course. Never thought you meant anything else. > > > > If you read more carefully I've given you two examples of how tasks > > using polymorphism can not be scheduled ahead of time either. Any kind > > of dynamic control flow needs an on-line scheduler. > > Sure it can! You just have to say that polymorphism isn't allowed to > cause new tasks to get spawned synchronously. Which is what I've been trying to say all this time. If the scheduler can't handle dynamic code, then it's not really a general system. I've given you several examples of very mundane and common uses of dynamic control flow that wouldn't be possible for this reason, so I believe I've made my point pretty clear on why this is unlikely to be acceptable for me at least (especially since the benefits are requiring these restrictions are highly dubious, IMO). -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |
|
From: Jarkko L. <al...@gm...> - 2009-04-11 09:20:25
|
I was talking about generic task system before. Dependencies impact to the task priorities, because you want the longest chain of work (critical path) to start as early as possible in the frame. Another thing which impacts to the chain of the work is the length of each task in the chain, which is why I profile each task in the task manager to know which chain is the longest. Now, you are telling that you don't know about dependencies, etc. beforehand, but in games we are in fortunate position that pretty much the same call graph is ran from one frame to the next, so it's possible to exploit frame-to-frame coherency to prioritize tasks and have pretty good idea what's going to happen ahead of time. It's not perfect of course (i.e. when a task is run for the first time there is no priority for it), but it's close and definitely better than not prioritizing at all. So the task prioritization is very dynamic and can adapt to the situation where different tasks are pushed to the queue and where their lengths change from frame to frame. In my implementation I don't define dependencies beforehand like you seem to assume you would have to do, but I define them at the call site, so there isn't external maintenance to do but the usage is pretty much like calling a function (1-2 lines of code), so it's quite convenient to use. One motivation for the job queue I implemented was that it should be simple to use so that you could easier take advantage of opportunities for parallelism. Cheers, Jarkko _____ From: Sebastian Sylvan [mailto:seb...@gm...] Sent: Friday, April 10, 2009 7:54 PM To: Game Development Algorithms Subject: Re: [Algorithms] General purpose task parallel threading approach On Fri, Apr 10, 2009 at 5:08 PM, Jarkko Lempiainen <al...@gm...> wrote: You can't statically schedule the work because the work changes dynamically from on frame to the next. In one frame it may be best to start running a->b->c->d as soon as possible, while sometimes you should start running z early because it takes 10 * y, etc. It's the scheduler's job to figure out the best order for optimal throughput and it would be silly to try to craft the order directly to the code. Not really, we do things like that all the time today - e.g. adding a speculative prefetch because the profiler says that most of the time the code will go down a certain branch and read a certain bit of data. It may not be strictly possible to predict which order will be better in every single circumstance, but you can look a the profiler output and optimize for the most common case. I see what you're saying now, though. You weren't actually talking about a general task system before? So when you said "priorities" you really meant ordering of a dependency graph that's known ahead of time (i.e. before you start executing the first task), not a dynamic priority system for a general task system. The use of the word "priority" threw me (as it's generally referred to in the context of a user giving a certain task a priority and having it execute first). However, using a static dependency graph (not static as in "compile-time", static as in "known before execution starts") is a bit like solving a problem with a slow heap allocator by saying that you're not allowed to do dynamic allocations. It doesn't actually solve the problem. I think we really do want a task system that would "work like normal" except allow you to run certain things in parallel, not something which has all sorts of restrictions on the kind of code you're allowed to write. Also, which scenario do you think is more likely: A) The intricate one given before (which needs extremely specific circumstances to be a problem). B) Merely a couple of if statements or virtual function calls, giving you dynamic dependencies. As an example of why B is bad, consider that e.g. a task may have to be scheduled after some model has finished animating, even though it only needs to look at the animation data in very rare circumstances (e.g. only a specific subclass looks at it, or maybe it does hit detection and only needs animation data if the conservative bounding sphere test succeeds, etc.). In fact, this not only impacts throughput (a task that is actually ready to run is blocked, while threads may be idle), but it may cause unnecessary computatations (what if nothing ever needs that animation data - with a static system you still need to compute it up front in case something does). There are all sorts of cases where dependencies can only be known after looking at dynamic conditions, meaning you have to start executing the task before you know what other tasks you have dependencies on. Not supporting that doesn't seem like it will scale in general, IMO. Honestly dynamic call graphs are pretty common (if statements, virtual function calls, loops, hell scripting etc. too) so I'm probably going to go with saying that B is probably more common. So given that static task systems have these issues, and that they are highly restrictive in the kinds of computations you can parallelize, and that they don't seem to actually perform better (the paper I linked earlier showed work stealing matching hand-scheduled tasks), I think I know what I'll go for. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |