Re: [Algorithms] General purpose task parallel threading approach

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Jarkko Lempiainen wrote:
> I think you really need to target for both task and data level parallelism
> to make your game engine truly scalable for many-core architectures. I don't
> see why you would need to consider the cost of context switching in relation
> to const of a task execution though since if you got a job queue, each
> worker thread just pulls tasks from the queue without switching context.
>   

If you are fully FIFO in your data flow, then that is true.

However, in real code, you end up with data flow dependencies. For 
example, you typically want to do simulation something like:

extract state from previous iteration of physics frame
run collision/intersection tests
push state over to renderer
read user input
extract entity behavior based on collision and input state
kick off new physics frame
present the frame (vsync)

There are, unfortunately, a number of intra-frame dependencies here, 
such as the entity behavior needing the output of the collision tests. 
(Can I jump? That depends on whether I have my feet on the ground.)
You may even have intra-frame dependencies in the entity behavior job, 
where one entity could cause more behavior for other entities 
(explosions, triggers, etc).

You can set this up to be totally FIFO, and thus be able to start all 
the jobs in parallel, but you will be introducing latency; sometimes as 
much as two additional frames (which for certain game types can be 
noticed as sluggish controls, etc).

Now, most of these dependencies are in the form of "this task can't 
start until that task has ended" which can be implemented in a 
multi-threaded pool without blocking -- if a single thread runs low on 
work, it simply busy-waits until some pre-requisite finishes. Or you 
have a queue of background work, such as pathfinding queries etc -- but 
sometimes, that queue will be empty anyway. Or you can use a blocking 
primitive of some sort (fibers or threads) to deal with the case where 
the queue is temporarily stalled waiting on the output of something else 
that's executing out of the queue. Depending on how much of this happens 
each frame, the overhead may or may not matter.

This brings up another interesting point, which you also remarked on: 
The workload for a game is generally quite different than the workload 
from a server. Typically, you pre-load everything you want, and just run 
in-core. The cases where that breaks down is various large-world games 
(streaming worlds, MMOs, etc) where there really are "unplanned" loads 
of data, where being able to have a task block until it's done would 
sometimes be convenient. However, typically this happens seldom enough 
(compared to the other stuff) that a state machine approach, or a simple 
threading approach, really isn't that punitive.

Sincerely,

jw