Re: [GD-Consoles] what to do with little cash

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> I've got a related question:
>
> (Assuming a PS2 style console)

I'm going to assume an actual PS2...

> Given that you have game objects that are composed of various components
> (car = physics + collision + AI + script + sound, eg..).    Is it better
> to process this "per-object" (where you iterate over the objects, and
> updated all its components), or "per-component" (where for each type of
> component, you iterate over each instance of the component type, and
> update it)
>
> Per-component might give you better usage of the I-Cache, but per-object
> might give you better D-Cache usage.

True. In fact, if you try to rely on either (or both) caches without
explicitly optimising anything, you'll probably get pretty poor performance.
I'd normally expect the D-Cache misses to be roughly double those of the
I-Cache however.

Optimising for I-Cache usage is not easy, at least beyond the level of
simply reducing the amount of code executed and trying to keep it
sequential. In fact, just to go back to the original question, using C++
features like virtual functions or in fact any coding practice which
encourages the developer to write routines which are liable to be scattered
randomly around in memory will tend to thrash the I-Cache even more.

We can often spot the difference between C and C++ purely by the amount of
I-Cache thrashing on a performance analyser graph. It's not a problem of the
language itself so much as the way people typically use it, and the
compiler/linker 's inability to optimise anything for cache usage.

Optimising for D-Cache usage is an easier option. Even just inserting some
prefetch operations in the right place can help out. Beyond that, you can
start pulling things into scratchpad ram via DMA, or use uncached memory
pointers to load/store things that are better off not polluting the cache.
Basically, there are a lot of options for improving data access.

So what I'd suggest, is to spend time optimising data usage first. Once you
know you have really efficient routines in that aspect you can start trying
to rearrange the code to be friendly to the I-Cache as well. You *might* get
faster code by splitting into several, tighter passes, but only if you have
particularly good data usage in the first place.

Of course if you're talking about passes over objects that use competely
different data sets (the rendering and collision and AI may not share
anything other than a transform for example) then I'd certainly recommend
doing different passes. Don't run one bit of code, and then run different
code over different data and before trying to reuse something you just
discarded... thats just asking for trouble.

I'd certainly try to limit what I try doing with the language in a confined,
performance critical environment such as a console. I don't just mean for
your graphics engine either. Typically  the performance of the engine is one
of the better parts of the code and the real problems lie in the other, less
well optimised sections. Profile often, use the performance counters, keep
looking at the code being output from the compiler and experiment with
different code structures until you find one that gives you the most
comfortable trade off between performance and ease-of-coding.

Oh, and offload stuff into VU0 microcode - apart from running in parallel,
it reduces I-Cache hits (as opposed to macromode coding, which makes it
worse). Getting data in and out can be fiddly, but does force you to think
pretty hard about organising data and reducing the amount of it you process,
so that can be a good thing anyway...

Cheers,

Jase

--
Jason G Doig
Principal Engineer
Technology Group (R&D)
Sony Computer Entertainment Europe