Re: [GD-Consoles] what to do with little cash
Brought to you by:
vexxed72
From: Jason G D. <jas...@py...> - 2003-02-25 08:42:29
|
> I've got a related question: > > (Assuming a PS2 style console) I'm going to assume an actual PS2... > Given that you have game objects that are composed of various components > (car = physics + collision + AI + script + sound, eg..). Is it better > to process this "per-object" (where you iterate over the objects, and > updated all its components), or "per-component" (where for each type of > component, you iterate over each instance of the component type, and > update it) > > Per-component might give you better usage of the I-Cache, but per-object > might give you better D-Cache usage. True. In fact, if you try to rely on either (or both) caches without explicitly optimising anything, you'll probably get pretty poor performance. I'd normally expect the D-Cache misses to be roughly double those of the I-Cache however. Optimising for I-Cache usage is not easy, at least beyond the level of simply reducing the amount of code executed and trying to keep it sequential. In fact, just to go back to the original question, using C++ features like virtual functions or in fact any coding practice which encourages the developer to write routines which are liable to be scattered randomly around in memory will tend to thrash the I-Cache even more. We can often spot the difference between C and C++ purely by the amount of I-Cache thrashing on a performance analyser graph. It's not a problem of the language itself so much as the way people typically use it, and the compiler/linker 's inability to optimise anything for cache usage. Optimising for D-Cache usage is an easier option. Even just inserting some prefetch operations in the right place can help out. Beyond that, you can start pulling things into scratchpad ram via DMA, or use uncached memory pointers to load/store things that are better off not polluting the cache. Basically, there are a lot of options for improving data access. So what I'd suggest, is to spend time optimising data usage first. Once you know you have really efficient routines in that aspect you can start trying to rearrange the code to be friendly to the I-Cache as well. You *might* get faster code by splitting into several, tighter passes, but only if you have particularly good data usage in the first place. Of course if you're talking about passes over objects that use competely different data sets (the rendering and collision and AI may not share anything other than a transform for example) then I'd certainly recommend doing different passes. Don't run one bit of code, and then run different code over different data and before trying to reuse something you just discarded... thats just asking for trouble. I'd certainly try to limit what I try doing with the language in a confined, performance critical environment such as a console. I don't just mean for your graphics engine either. Typically the performance of the engine is one of the better parts of the code and the real problems lie in the other, less well optimised sections. Profile often, use the performance counters, keep looking at the code being output from the compiler and experiment with different code structures until you find one that gives you the most comfortable trade off between performance and ease-of-coding. Oh, and offload stuff into VU0 microcode - apart from running in parallel, it reduces I-Cache hits (as opposed to macromode coding, which makes it worse). Getting data in and out can be fiddly, but does force you to think pretty hard about organising data and reducing the amount of it you process, so that can be a good thing anyway... Cheers, Jase -- Jason G Doig Principal Engineer Technology Group (R&D) Sony Computer Entertainment Europe |