Thread: Re: Circular references (was Re: [Algorithms] Reference counting hell... what is a good paradigm.)
Brought to you by:
vexxed72
From: Christer_Ericson@Playstation.sony.com - 2000-12-27 07:00:04
|
Jim Offerman wrote: >The overhead of using smart pointers or id<->pointer mappings is becoming >negligiable these days ... Careful, not everyone on here targets (Intel-based) PCs. While your statement might be true for a PC sporting a CPU with a large L2 cache, it can (read will) be quite detrimental on, say, a PS2 that doesn't have one. Christer Ericson SCEA, Santa Monica |
From: Christer_Ericson@Playstation.sony.com - 2000-12-27 23:37:03
|
Gil Gribb wrote: >> Jim Offerman wrote: >> >The overhead of using smart pointers or id<->pointer mappings is becoming >> >negligiable these days ... >> >> Careful, not everyone on here targets (Intel-based) PCs. >> >> While your statement might be true for a PC sporting a CPU with a large L2 >> cache, it can (read will) be quite detrimental on, say, a PS2 that doesn't >> have one. > >Well, you offer no alternative. As I have pointed out pointers simply don't >work for game entity cross references. I have a constant time scheme that >takes perhaps a dozen cycles, and it has exactly the same cache behaviour as >a pointers. Not much is going on today, so I will show a detailed example. I merely meant to point out that on e.g. the PS2, hardly any memory accesses are "negligible". As it only has an 8K 2-way associative data cache, you really should go to great lengths to avoid ever reading outside the cache (by making use of the 16K scratchpad RAM, using tight data structures, etc). This drastically conflicts with programming a PC with a large L2-cache and has serious repercussions on how you design things. I'm sure this is one reason why we initially heard lots of PC developers complain about how hard the PS2 was to program for. I didn't mean to comment on the smart pointers vs. ID-to-pointer issue at all. Just to point out that on the PS2 any level of memory indirection can be quite expensive and should be avoided (if possible, of course). Of course, there will always be things such as time-memory trade-offs, but thinking about how to stay in cache should be the #1 thing in a PS2 programmer's mind. I'm not really at liberty to discuss the PS2 hardware further here (due to NDAs and all that crap), but the above info is all available on the net, at e.g. http://www.anarchists.co.uk/html/isscc.html for anyone who is interested in more detail. Christer Ericson SCEA, Santa Monica |
From: Gil G. <gg...@ra...> - 2000-12-29 03:05:27
|
> Gil Gribb wrote: > > >> Jim Offerman wrote: > >> >The overhead of using smart pointers or id<->pointer mappings is > becoming > >> >negligiable these days ... > >> > >> Careful, not everyone on here targets (Intel-based) PCs. > >> > >> While your statement might be true for a PC sporting a CPU with a large > L2 > >> cache, it can (read will) be quite detrimental on, say, a PS2 that > doesn't > >> have one. > > > >Well, you offer no alternative. As I have pointed out pointers simply > don't > >work for game entity cross references. I have a constant time scheme that > >takes perhaps a dozen cycles, and it has exactly the same cache behaviour > as > >a pointers. Not much is going on today, so I will show a detailed example. > > > I merely meant to point out that on e.g. the PS2, hardly any memory > accesses > are "negligible". As it only has an 8K 2-way associative data cache, you > really should go to great lengths to avoid ever reading outside the cache > (by making use of the 16K scratchpad RAM, using tight data structures, > etc). What sort of games have you written? You have a 32 MB machine and you are telling me to avoid EVER reading outside of 8K? Crazy talk. > This drastically conflicts with programming a PC with a large L2-cache > and has serious repercussions on how you design things. Well, again, you have offered no alternatives other than "write you program to fit in 8K of L1 cache". The lack of an L2 cache was one of the most surprising things I noticed when I first review the psx2 docs. For example, this is an email I sent to my team on march 15th: <Looking at the psx2 docs, I noticed alot of discussion of avoiding L1 cache <misses. It just occured to me, the psx2 has no L2 cache! This is really <going to change the notion of what is reasonably fast. Consider for example <that a P3 could be produced for about 40% of the current cost if it didn't <have a L2 cache. The original celeron had no L2 cache and it was a <performance disaster. <I'm really starting to question if a huge C++ codebase like UT can ever run <reasonably on the psx2. So it isn't like I am unaware of the issues. > I'm sure this is one reason why we initially heard lots of PC developers > complain about how hard the PS2 was to program for. Nope, not at all. We found it difficult because the sony sample code for the graphics end of things is total garbage. The fastest graphics code, written by someone whos full time job it is to write such code, was simply wrong, in at least 3 ways. We figured it out thought, mostly. I still don't have good mip mapping, but we are close. I must say the hardware is very very cool, but sony has not provided the code needed to take advantage of it. Our graphics code is essentially from scratch and is much much better than anything sony provided, writing that is what was hard. > I didn't mean to comment on the smart pointers vs. ID-to-pointer issue > at all. Just to point out that on the PS2 any level of memory indirection > can be quite expensive and should be avoided (if possible, of course). Well, indirection is absolutely critical to algortihm design. I'm not writing tetris here, I fully expect to have more than 5MB of source code. > Of course, there will always be things such as time-memory trade-offs, > but thinking about how to stay in cache should be the #1 thing in a > PS2 programmer's mind. So, you are saying "don't use any algorithm that goes outside the 8K L1 cache?"....nonsense. You seem to be saying that the ps2 is crippled by the lack of an L2 cache....I don't see that. Just like they taught us in school, algorithms are good or bad regardless of the hardware. Sorry for the harshness. -Gil > I'm not really at liberty to discuss the PS2 hardware further here (due to > NDAs and all that crap), but the above info is all available on the net, > at e.g. > > http://www.anarchists.co.uk/html/isscc.html > > for anyone who is interested in more detail. > > > Christer Ericson > SCEA, Santa Monica > > > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list |
From: Neil S. <ne...@r0...> - 2000-12-29 04:46:21
|
> I merely meant to point out that on e.g. the PS2, hardly any memory > accesses > are "negligible". As it only has an 8K 2-way associative data cache, you > really should go to great lengths to avoid ever reading outside the cache > (by making use of the 16K scratchpad RAM, using tight data structures, > etc). > > This drastically conflicts with programming a PC with a large L2-cache > and has serious repercussions on how you design things. Memory accesses are negligible when they exist in the 80% of code that takes 20% of the processing time. The important thing here is to determine where most of the processing time lies, and for most game engines on most machines, this will be in the rendering and physics code (and maybe AI code). The use of smart pointers over ids is essentially academic as far as processing overhead goes, so long as they are not used much past resource management (ie. using them for *everything* is wasteful and overkill IMHO). On consoles like the PS2, efficiency is mainly determined by the use of hardware consurrency (this is also true on PCs, but to a lesser degree), and more gain can be made from clever implementations of the rendering and physics code than can ever be made by changing an entire engine design (unless, of course, the engine design limits how the rendering/physics code can be implemented). > I'm sure this is one reason why we initially heard lots of PC developers > complain about how hard the PS2 was to program for. Possibly, but I suspect that many of those PC developers were having problems rendering quickly (until they adopted a PS2-friendly approach) and hadn't even begun to investigate perfromance problems with smart-pointers/ids yet. I dont think anyone that had worked on a console before was at all surprised by the PS2. As with most consoles, it is reasonably easy to get things working, but far more difficult to use the hardware to its full potential. Used properly, it is a very fast machine, but many PC engine architectures are not suitable for simple porting, hence the problem. > I didn't mean to comment on the smart pointers vs. ID-to-pointer issue > at all. Just to point out that on the PS2 any level of memory indirection > can be quite expensive and should be avoided (if possible, of course). They key here is to determine where most memory accesses are being done. In a typical engine, this will be in the rendering and physics code. Although memory access is expensive *everywhere*, it has less impact where it is executed fewer times, and there is no point optimising *all* code for PS2 (or any console for that matter) when most of the execution time is in a small subset of the code. Any game engine where this is not the case (with exceptions such as advanced AI, etc.) will probably have problems on all machines, not just consoles, and probably wants a redesign. > Of course, there will always be things such as time-memory trade-offs, > but thinking about how to stay in cache should be the #1 thing in a > PS2 programmer's mind. Well, the programmer that has to write the rendering and physics code, at least... ;) - Neil |
From: Pierre T. <p.t...@wa...> - 2000-12-29 22:13:00
|
BTW... As I wrote before, one key advantage of IDs over pointers is that you can move some resources in ram without invalidating the relations/links between two objects. The way you reorganize the memory layout depends on your runtime access patterns - as explained here once by Ville Miettinen IIRC, I think the Umbra guys use "accessors" for this. (which sounds quite difficult to avoid as soon as you use lazy evaluation all over the place) Now, the whole point of the reorganization is to become cache-friendly, so I guess this should be very efficient on PS2 according to your claims. And since you just can't do it with pointers (at least not in such an automatic way), IDs may finally well be the cache-friendly way :) The little piece of code responsible for the ID-to-pointer translation is not the problem IMO. This is exactly the same indirection (in my scheme) as the one involved in virtual methods VS non-virtual ones. This is the same overhead as well : virtually free on all modern computers. If it's not the case on PS2, then I'm very happy not to program on PS2... Pierre ----- Original Message ----- From: <Christer_Ericson@Playstation.sony.com> To: <gda...@li...> Sent: Wednesday, December 27, 2000 11:53 PM Subject: Re: Circular references (was Re: [Algorithms] Reference counting hell... whatis a good paradigm.) > > > Gil Gribb wrote: > > >> Jim Offerman wrote: > >> >The overhead of using smart pointers or id<->pointer mappings is > becoming > >> >negligiable these days ... > >> > >> Careful, not everyone on here targets (Intel-based) PCs. > >> > >> While your statement might be true for a PC sporting a CPU with a large > L2 > >> cache, it can (read will) be quite detrimental on, say, a PS2 that > doesn't > >> have one. > > > >Well, you offer no alternative. As I have pointed out pointers simply > don't > >work for game entity cross references. I have a constant time scheme that > >takes perhaps a dozen cycles, and it has exactly the same cache behaviour > as > >a pointers. Not much is going on today, so I will show a detailed example. > > > I merely meant to point out that on e.g. the PS2, hardly any memory > accesses > are "negligible". As it only has an 8K 2-way associative data cache, you > really should go to great lengths to avoid ever reading outside the cache > (by making use of the 16K scratchpad RAM, using tight data structures, > etc). > > This drastically conflicts with programming a PC with a large L2-cache > and has serious repercussions on how you design things. > > I'm sure this is one reason why we initially heard lots of PC developers > complain about how hard the PS2 was to program for. > > I didn't mean to comment on the smart pointers vs. ID-to-pointer issue > at all. Just to point out that on the PS2 any level of memory indirection > can be quite expensive and should be avoided (if possible, of course). > > Of course, there will always be things such as time-memory trade-offs, > but thinking about how to stay in cache should be the #1 thing in a > PS2 programmer's mind. > > > I'm not really at liberty to discuss the PS2 hardware further here (due to > NDAs and all that crap), but the above info is all available on the net, > at e.g. > > http://www.anarchists.co.uk/html/isscc.html > > for anyone who is interested in more detail. > > > Christer Ericson > SCEA, Santa Monica > > > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list |
From: Corrinne Y. <cor...@ho...> - 2000-12-29 22:44:59
|
----- Original Message ----- From: "Pierre Terdiman" <p.t...@wa...> To: <gda...@li...> Sent: Friday, December 29, 2000 4:08 PM Subject: Re: Circular references (was Re: [Algorithms] Reference counting hell... whatis a good paradigm.) > As I wrote before, one key advantage of IDs over pointers is that you can > move some resources in ram without invalidating the relations/links between > two objects. The way you reorganize the memory layout depends on your -- Now that this discussion has slowed down a tad :) -- Accessor type ID indirection (in my bit of experience here, there is lumpsum data streaming in spec ops, but what I am programming here is much more substantial streaming where frighteningly everything is potentially virtual potentially streamable) is often very necessary for Streaming type architecture, where you can't count on for example the collision data structure of your engine already being instantiated. -- Lazy Evaluation of ID data can and will cause horrible choke behaviors and framerate inconsistency, memory reference time overhead being really even the least of the problem in that case. You still need a smart Thread or a smart Regular Call to pre-evaluate your to be accessed data and spread out the time, the data access seeks, etc. You can't wait to need it then try to get it; very inconsistent slowness will result, whatever your cache hardware. (there probably are exceptions :) ) -- A great reference for learning how to code good regular "pre-accessing of data" is all the literature out there for advanced garbage collection techniques, though what we are doing is the opposite of garbage collection. :) -- I am taking a lot of (technical) risk here basing my arhitecture for full streaming flexibility and virtual reference to everything. It is easier to count on performance when every access is direct and behavior preset; though virtual ID reference may not be as bad as everyone suggests. Good thing about virtual reference architecture is you can always attempt to gain perfomance by "trivializing" all the access functions into direct access; going from direct access into an "interfaced" access on a finished product is a lot harder. :) |
From: Pierre T. <p.t...@wa...> - 2000-12-30 02:48:15
|
> -- Accessor type ID indirection (in my bit of experience here, there is > lumpsum data streaming in spec ops, but what I am programming here is much > more substantial streaming where frighteningly everything is potentially > virtual potentially streamable) is often very necessary for Streaming type > architecture, where you can't count on for example the collision data > structure of your engine already being instantiated. Nowadays I think that kind of architecture is the only possible one :) We now have quite large worlds to handle, a lot of data to deal with, and a lot of things to do. More polygons, more data structures, more textures, more sounds, etc. Even if you don't need it in a given project/game, it's always nice to foresee your future needs, just not to re-code everything each time. IMHO of course :) So let's stream! > -- Lazy Evaluation of ID data can and will cause horrible choke behaviors > and framerate inconsistency, memory reference time overhead being really > even the least of the problem in that case. You still need a smart Thread or > a smart Regular Call to pre-evaluate your to be accessed data and spread out > the time, the data access seeks, etc. You can't wait to need it then try to > get it; very inconsistent slowness will result, whatever your cache > hardware. (there probably are exceptions :) ) Very very right. That's what I recently "discovered" (it wasn't a big surprise actually) in my current "engine". (whatever that word means...) Do you have a magic-bullet solution to this? For the moment, I use a bounding volume larger than the viewing frustum to pre-load / pre-initialize the expected objects. [You can extend the larger BV to the whole world to get back your non-lazy-evaluated engine BTW, so it's always useful to use lazy-evaluation in the first place IMO.] > -- I am taking a lot of (technical) risk here basing my arhitecture for full > streaming flexibility and virtual reference to everything. It is easier to > count on performance when every access is direct and behavior preset; though > virtual ID reference may not be as bad as everyone suggests. Good thing > about virtual reference architecture is you can always attempt to gain > perfomance by "trivializing" all the access functions into direct access; > going from direct access into an "interfaced" access on a finished product > is a lot harder. :) Agreed :) Pierre |
From: Christer_Ericson@Playstation.sony.com - 2000-12-29 04:21:59
|
Gil Gribb wrote: >Sorry for the harshness. I'm taking this off the list for obvious reasons. Christer Ericson SCEA, Santa Monica |
From: Gil G. <gg...@ra...> - 2000-12-27 16:17:52
|
> Jim Offerman wrote: > >The overhead of using smart pointers or id<->pointer mappings is becoming > >negligiable these days ... > > Careful, not everyone on here targets (Intel-based) PCs. > > While your statement might be true for a PC sporting a CPU with a large L2 > cache, it can (read will) be quite detrimental on, say, a PS2 that doesn't > have one. Well, you offer no alternative. As I have pointed out pointers simply don't work for game entity cross references. I have a constant time scheme that takes perhaps a dozen cycles, and it has exactly the same cache behaviour as a pointers. Not much is going on today, so I will show a detailed example. But first, the lack of a L2 cache does certainly affect choice of data structures, but in general you can't avoid searching, even though it is cache unfriendly. For example, a scene graph needs to be searched, and this will definitely thrash the L1 cache, but it isn't like a linear search will yeild better performance. Another example is the "think loop". Many games, such as quakeX, look at every entity, every game frame to see who needs to think. It is far more efficient to use a priority queue to touch only those entities that need to think. Using a priority queue is going to require some "random" memory accesses, but it will be far faster than accessing every entity every frame. ---- In this example, an entity is a game actor, such as a player, projectile, door, platform, pickup, effect, ect. I am writing the (simplified) example in C, but I use C++ for a similar design for my ps2 engine. Fast, cache friendly entity ID's: typedef unsigned int EntityID; struct Entity { EntityID MyID; Entity *NextFreeEntity; // ..... EntityID Target; // this is our sample entity cross reference // the target is the other entity this entity is attacking. }; Entity EntityArray[1024]; Entity *FreeChainRoot; void InitializeEntities() { int i; for (i=0;i<1024;i++) { EntityArray[i].MyID = i + (1<<10); if (i<1023) EntityArray[i].NextFreeEntity=EntityArray+i+1; else EntityArray[i].NextFreeEntity=0; } FreeChainRoot=EntityArray; } Entity *NewEntity() { Entity *ret=FreeChainRoot; if (ret) FreeChainRoot=ret->NextFreeEntity; return ret; } void DeleteEntity(Entity *who) { who->MyID += 1<<10; // this should be checked for overflow, see note below who->NextFreeEntity=FreeChainRoot; FreeChainRoot=who; } Entity *IDToPointer(EntityID id) { Entity *ret=EntityArray + (id & 1023); if (id != ret->MyID) return 0; return ret; } // sample usage: void Attack(Entity *me) { Entity *who=IDToPointer(me->Target); if (who) { // valid target, proceed with attack } else { // target has died, acquire new target ect } } Assuming the code calling IDToPointer is going to immediately dereference the pointer, (and why would it want the pointer without dereferencing it?) the cache behaviour is identical to using pointers. There are three ways to deal with ID's overflowing. You could consider 64 bit ids. It might be ok to reset it back to its intital value, since this id hasn't been used in a very long time. But I think the best solution is to add deleted entities onto the end of the free chain, thereby guaranteeing a full 2^32 new entities before overflow occurs. -Gil |