Re: [Algorithms] General purpose task parallel threading approach
Brought to you by:
vexxed72
|
From: Jarkko L. <al...@gm...> - 2009-04-11 09:20:25
|
I was talking about generic task system before. Dependencies impact to the task priorities, because you want the longest chain of work (critical path) to start as early as possible in the frame. Another thing which impacts to the chain of the work is the length of each task in the chain, which is why I profile each task in the task manager to know which chain is the longest. Now, you are telling that you don't know about dependencies, etc. beforehand, but in games we are in fortunate position that pretty much the same call graph is ran from one frame to the next, so it's possible to exploit frame-to-frame coherency to prioritize tasks and have pretty good idea what's going to happen ahead of time. It's not perfect of course (i.e. when a task is run for the first time there is no priority for it), but it's close and definitely better than not prioritizing at all. So the task prioritization is very dynamic and can adapt to the situation where different tasks are pushed to the queue and where their lengths change from frame to frame. In my implementation I don't define dependencies beforehand like you seem to assume you would have to do, but I define them at the call site, so there isn't external maintenance to do but the usage is pretty much like calling a function (1-2 lines of code), so it's quite convenient to use. One motivation for the job queue I implemented was that it should be simple to use so that you could easier take advantage of opportunities for parallelism. Cheers, Jarkko _____ From: Sebastian Sylvan [mailto:seb...@gm...] Sent: Friday, April 10, 2009 7:54 PM To: Game Development Algorithms Subject: Re: [Algorithms] General purpose task parallel threading approach On Fri, Apr 10, 2009 at 5:08 PM, Jarkko Lempiainen <al...@gm...> wrote: You can't statically schedule the work because the work changes dynamically from on frame to the next. In one frame it may be best to start running a->b->c->d as soon as possible, while sometimes you should start running z early because it takes 10 * y, etc. It's the scheduler's job to figure out the best order for optimal throughput and it would be silly to try to craft the order directly to the code. Not really, we do things like that all the time today - e.g. adding a speculative prefetch because the profiler says that most of the time the code will go down a certain branch and read a certain bit of data. It may not be strictly possible to predict which order will be better in every single circumstance, but you can look a the profiler output and optimize for the most common case. I see what you're saying now, though. You weren't actually talking about a general task system before? So when you said "priorities" you really meant ordering of a dependency graph that's known ahead of time (i.e. before you start executing the first task), not a dynamic priority system for a general task system. The use of the word "priority" threw me (as it's generally referred to in the context of a user giving a certain task a priority and having it execute first). However, using a static dependency graph (not static as in "compile-time", static as in "known before execution starts") is a bit like solving a problem with a slow heap allocator by saying that you're not allowed to do dynamic allocations. It doesn't actually solve the problem. I think we really do want a task system that would "work like normal" except allow you to run certain things in parallel, not something which has all sorts of restrictions on the kind of code you're allowed to write. Also, which scenario do you think is more likely: A) The intricate one given before (which needs extremely specific circumstances to be a problem). B) Merely a couple of if statements or virtual function calls, giving you dynamic dependencies. As an example of why B is bad, consider that e.g. a task may have to be scheduled after some model has finished animating, even though it only needs to look at the animation data in very rare circumstances (e.g. only a specific subclass looks at it, or maybe it does hit detection and only needs animation data if the conservative bounding sphere test succeeds, etc.). In fact, this not only impacts throughput (a task that is actually ready to run is blocked, while threads may be idle), but it may cause unnecessary computatations (what if nothing ever needs that animation data - with a static system you still need to compute it up front in case something does). There are all sorts of cases where dependencies can only be known after looking at dynamic conditions, meaning you have to start executing the task before you know what other tasks you have dependencies on. Not supporting that doesn't seem like it will scale in general, IMO. Honestly dynamic call graphs are pretty common (if statements, virtual function calls, loops, hell scripting etc. too) so I'm probably going to go with saying that B is probably more common. So given that static task systems have these issues, and that they are highly restrictive in the kinds of computations you can parallelize, and that they don't seem to actually perform better (the paper I linked earlier showed work stealing matching hand-scheduled tasks), I think I know what I'll go for. -- Sebastian Sylvan +44(0)7857-300802 UIN: 44640862 |