RE: [Algorithms] multi threaded rendering/game processing

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Thu, 23 Feb 2006 20:59:12 -0800, "Kelly Brock" <kb....@sb...>
wrote:
>[...]
> 20 years later we're running into a similar problem where you don't want to
> compute the AI for one object until the movement of another object has
> been finished.  Their solution won't work on multi-core of course, well it
> could but it would likely be incredibly slow.  I presented a possible
> optimization specifically for SMP machines to some folks working on such a
> renderer based on an oldish game technique.  Manual threading, aka
> coroutines, aka fibers, whatever you want to call them.  So, where
> Renderman would execute each single instruction on every micro-polygon in
> parallel, you would run the code straight up to the point of the data
> dependency  and then yield.

This is commonly referred to as "cooperative threading".

> Applying this thinking to games requires the "coder" to know where the
> dependencies exist so they know when to yield, but it does not = require
> complex locking/unlocking of data.  It maintains "linear" programming = such
> that even a junior programmer can be effective once they get the fairly
> "simple" concepts (compared to optimal usage of sync objects and = lockless
> etc etc) and fairly well distributed PU load balancing.
> 
> So, to answer the "last sentence" in your comment.  The idea of blocking
> until you move to the next task in this case is best explained = with a
> little table:
> 
> Yield Point  |  Task1 Task2 Task3 ... TaskN
>      1       |    X
>      2       |                X
>      3       |
>      4       |          X
>
> This assumes you started running "N" tasks and everything has been processed
> once.  So, Task1 hit a yield point very early on, it can not = start
> executing again until it is known that all other tasks have "passed" = that
> point.  So, if there are any tasks still executing when Task1 comes up = for
> a restart, you either have to block or skip it this pass of the scheduler.
> Tasks can run to completion in one pass, or they can stop at any given =
> point on the way there.

This can work on multi-cores also. It's just requires a mutex to maintain
each "lock step". The "yield points" could ether be individual semaphores or
they could all share the same semaphore. Ether way the end result is the
same in that it would enforce sequential behavior where it matters.

> This is not the method I'm using at work but perhaps the thinking process
> may be of use given that it is completely outside of anything = I've seen
> folks mentioning so far.  I can foresee some problems with such a = model on
> the 360, the stack changes may use way too much memory bandwidth, on = the
> PS3 though, it actually makes considerable sense from what I can see.  =
> But, as a "think out of the box" possibility, perhaps folks have considered
> = it and rejected it or not, worth posting either way as I'd like to hear
> any results folks have had.  (Honestly, I really want to experiment with it,
> = I just don't have the time. :()

This was actually the only threading model available on the Macintosh until
the MultiProcessing API's were introduced in 8.6. Although we consider it
somewhat "old school" there are still a lot of applications shipping today
that use them.

> No doubt, manual threading didn't originate in C/C++, it was adopted.  =3D)

Which to me is part of the problem. It was "bolted on" after the fact.
Perhaps I'm just a bit bigoted because I have so many other issues (
performance wise) with C++.

> But, the language of choice which we can hire folks with experience is
> generally C/C++, so that was the solution I was looking = for in terms of
> "easy" learning curve.  The model I'm working with is not of my design but I
> already had a similar system working, so jumping in was not = a big deal.
> Given that the juniors are able to utilize it without causing massive
> bottlenecks (beyond typical too much memory, bad data layout = etc), I'd say
> it is a pretty strong case for "it's better than other possibilities" right
> now.

Understood.

>> As to your last sentence I'll surmise that by "you" you mean the =
> average game company? You're probably correct.

I wouldn't limit it to just game companies but yes.

>  Actually, I'd say that the ones on the market have already been
> snatched up...  We "can't" hire them because everyone went into instant
> hiring mode and they just don't exist anymore.  Well, unless you throw
> stupid money at them to steal them.  =3D)
>
>> It's also true for most non-game developers. Only the largest (budget)
>> companies can afford to hire = these "folks". All the more reason for this
>> level of smarts to be in the language and compilers. ;-)
>
> I won't disagree but I don't believe in silver bullets and the languages can
> only be made to help, not solve, the issues.  As an = example, I'm curious
> how many people have managed to get even 30% N-core cpu efficiency using
> OpenMP?  It is somewhat similar in concept but in my = tests it just doesn't
> work all that well, there is just too much overhead in = how it
> distributes.=20

I'm of the notion that a language could be devised to handle time-domain
data dependencies transparently to the programmer (beyond identifying the
dependencies). But then again I'm an optimist. ;-)

That's why I specifically mentioned small-talk and ProGraph. They are both
data flow languages that by their nature define the time domain data
dependencies necessary to optimally utilize multi-threaded (and multi-core)
hardware (without locks!).

But neither of these are the end-all or be-all solutions because of other
(non threading related) limitations.

OpenMP is a "good enough" solution for a specific sub-set of distributed
problems. But it's strengths for that sub-set can be deterrents of good
performance outside that domain.

> Actually, I believe that, on the contrary, I've presented something which is
> simply trying to walk before entering the marathon.  :)  From a scheduling
> standpoint it is complex, from the coding standpoint it is relatively simple
> to work with and from an efficiency standpoint it may = or may not be good,
> but it does self balance and utilizes n-cores without = huge amounts of
> invested knowledge.  I was honestly trying to see if anyone = else had
> considered such manual dependency "coding" instead of what I keep = seeing
> as "work as normal" solutions fighting the trend.

Lockstep programming is a "good enough" solution for what you're doing and
probably more realistic than my dream of a major language shift away from
the current paradigm. ;-)

-- 
Enjoy,
George Warner,
Schizophrenic Optimization Scientist
Apple Developer Technical Support (DTS)