Thread: [GD-Consoles] what to do with little cash
Brought to you by:
vexxed72
From: Raymond L. M. <ra...@wa...> - 2003-02-24 22:16:04
|
Hello! I have a question about programming a console that has a small = cache. Now I'm new to this console so I don't know all the details = about it except I've read from other people on this group about how the = cache is very small and that you can stall the the system if you don't = watch out. I was wondering if you could give me any tips and pointers = about what to watch out for in a engine framework or composition. For = example, I heard you want to avoid virtual functions because the vtable = can cause a cache miss. The problem is I do mostly c++ with a good deal = of virtual functions. Does this mean I can't use an object heirarcy = with virtual or pure virtual functions? My main concern is that my = engine layout is a pretty strict c++ implementation and I'm worried the = performance will suffer. I've looked through the documention on the = website provided for the console and I didn't see anything specific to = my question but if you know of anything let me know. I appreciate your = time and knowledge. Thanks, Raymond L. Maple WayForward Technologies |
From: Mick W. <mi...@ne...> - 2003-02-25 01:35:10
|
If you have little cash, you must invest wisely. The devil is in the details however. I'm sure many successful console developers have complex class heirachies, and lots of virtual functions. The virtual function overhead is reall not that bad, unless you are talking about really low level rendering components (you would not want, for example, to have a virtual "render" function for every polygon. The line has to be drawn, but it's open to experimentation. I've got a related question: (Assuming a PS2 style console) Given that you have game objects that are composed of various components (car = physics + collision + AI + script + sound, eg..). Is it better to process this "per-object" (where you iterate over the objects, and updated all its components), or "per-component" (where for each type of component, you iterate over each instance of the component type, and update it) Per-component might give you better usage of the I-Cache, but per-object might give you better D-Cache usage. Again, I suspect that "it depends", but has anyone thought about this? Anyone tried restructuring code one way or the other? Mick. -----Original Message----- From: gam...@li... [mailto:gam...@li...] On Behalf Of Raymond L. Maple Sent: Monday, February 24, 2003 2:14 PM To: gam...@li... Subject: [GD-Consoles] what to do with little cash Hello! I have a question about programming a console that has a small cache. Now I'm new to this console so I don't know all the details about it except I've read from other people on this group about how the cache is very small and that you can stall the the system if you don't watch out. I was wondering if you could give me any tips and pointers about what to watch out for in a engine framework or composition. For example, I heard you want to avoid virtual functions because the vtable can cause a cache miss. The problem is I do mostly c++ with a good deal of virtual functions. Does this mean I can't use an object heirarcy with virtual or pure virtual functions? My main concern is that my engine layout is a pretty strict c++ implementation and I'm worried the performance will suffer. I've looked through the documention on the website provided for the console and I didn't see anything specific to my question but if you know of anything let me know. I appreciate your time and knowledge. Thanks, Raymond L. Maple WayForward Technologies |
From: Jason G D. <jas...@py...> - 2003-02-25 08:42:29
|
> I've got a related question: > > (Assuming a PS2 style console) I'm going to assume an actual PS2... > Given that you have game objects that are composed of various components > (car = physics + collision + AI + script + sound, eg..). Is it better > to process this "per-object" (where you iterate over the objects, and > updated all its components), or "per-component" (where for each type of > component, you iterate over each instance of the component type, and > update it) > > Per-component might give you better usage of the I-Cache, but per-object > might give you better D-Cache usage. True. In fact, if you try to rely on either (or both) caches without explicitly optimising anything, you'll probably get pretty poor performance. I'd normally expect the D-Cache misses to be roughly double those of the I-Cache however. Optimising for I-Cache usage is not easy, at least beyond the level of simply reducing the amount of code executed and trying to keep it sequential. In fact, just to go back to the original question, using C++ features like virtual functions or in fact any coding practice which encourages the developer to write routines which are liable to be scattered randomly around in memory will tend to thrash the I-Cache even more. We can often spot the difference between C and C++ purely by the amount of I-Cache thrashing on a performance analyser graph. It's not a problem of the language itself so much as the way people typically use it, and the compiler/linker 's inability to optimise anything for cache usage. Optimising for D-Cache usage is an easier option. Even just inserting some prefetch operations in the right place can help out. Beyond that, you can start pulling things into scratchpad ram via DMA, or use uncached memory pointers to load/store things that are better off not polluting the cache. Basically, there are a lot of options for improving data access. So what I'd suggest, is to spend time optimising data usage first. Once you know you have really efficient routines in that aspect you can start trying to rearrange the code to be friendly to the I-Cache as well. You *might* get faster code by splitting into several, tighter passes, but only if you have particularly good data usage in the first place. Of course if you're talking about passes over objects that use competely different data sets (the rendering and collision and AI may not share anything other than a transform for example) then I'd certainly recommend doing different passes. Don't run one bit of code, and then run different code over different data and before trying to reuse something you just discarded... thats just asking for trouble. I'd certainly try to limit what I try doing with the language in a confined, performance critical environment such as a console. I don't just mean for your graphics engine either. Typically the performance of the engine is one of the better parts of the code and the real problems lie in the other, less well optimised sections. Profile often, use the performance counters, keep looking at the code being output from the compiler and experiment with different code structures until you find one that gives you the most comfortable trade off between performance and ease-of-coding. Oh, and offload stuff into VU0 microcode - apart from running in parallel, it reduces I-Cache hits (as opposed to macromode coding, which makes it worse). Getting data in and out can be fiddly, but does force you to think pretty hard about organising data and reducing the amount of it you process, so that can be a good thing anyway... Cheers, Jase -- Jason G Doig Principal Engineer Technology Group (R&D) Sony Computer Entertainment Europe |