Thread: Re: Circular references (was Re: [Algorithms] Reference counting hell... what is a good paradigm.)

Brought to you by: vexxed72

gdalgorithms-list

Re: Circular references (was Re: [Algorithms] Reference counting hell... what is a good paradigm.)

From: Christer_Ericson@Playstation.sony.com - 2000-12-27 07:00:04

Jim Offerman wrote:
>The overhead of using smart pointers or id<->pointer mappings is becoming
>negligiable these days ...

Careful, not everyone on here targets (Intel-based) PCs.

While your statement might be true for a PC sporting a CPU with a large L2
cache, it can (read will) be quite detrimental on, say, a PS2 that doesn't
have one.

Christer Ericson
SCEA, Santa Monica

Re: Circular references (was Re: [Algorithms] Reference counting hell... what is a good paradigm.)

From: Christer_Ericson@Playstation.sony.com - 2000-12-27 23:37:03

Gil Gribb wrote:

>> Jim Offerman wrote:
>> >The overhead of using smart pointers or id<->pointer mappings is
becoming
>> >negligiable these days ...
>>
>> Careful, not everyone on here targets (Intel-based) PCs.
>>
>> While your statement might be true for a PC sporting a CPU with a large
L2
>> cache, it can (read will) be quite detrimental on, say, a PS2 that
doesn't
>> have one.
>
>Well, you offer no alternative. As I have pointed out pointers simply
don't
>work for game entity cross references. I have a constant time scheme that
>takes perhaps a dozen cycles, and it has exactly the same cache behaviour
as
>a pointers. Not much is going on today, so I will show a detailed example.

I merely meant to point out that on e.g. the PS2, hardly any memory
accesses
are "negligible". As it only has an 8K 2-way associative data cache, you
really should go to great lengths to avoid ever reading outside the cache
(by making use of the 16K scratchpad RAM, using tight data structures,
etc).

This drastically conflicts with programming a PC with a large L2-cache
and has serious repercussions on how you design things.

I'm sure this is one reason why we initially heard lots of PC developers
complain about how hard the PS2 was to program for.

I didn't mean to comment on the smart pointers vs. ID-to-pointer issue
at all. Just to point out that on the PS2 any level of memory indirection
can be quite expensive and should be avoided (if possible, of course).

Of course, there will always be things such as time-memory trade-offs,
but thinking about how to stay in cache should be the #1 thing in a
PS2 programmer's mind.

I'm not really at liberty to discuss the PS2 hardware further here (due to
NDAs and all that crap), but the above info is all available on the net,
at e.g.

http://www.anarchists.co.uk/html/isscc.html

for anyone who is interested in more detail.

Christer Ericson
SCEA, Santa Monica

Re: Circular references (was Re: [Algorithms] Reference counting hell... what is a good paradigm.)

From: Gil G. <gg...@ra...> - 2000-12-29 03:05:27

> Gil Gribb wrote:
>
> >> Jim Offerman wrote:
> >> >The overhead of using smart pointers or id<->pointer mappings is
> becoming
> >> >negligiable these days ...
> >>
> >> Careful, not everyone on here targets (Intel-based) PCs.
> >>
> >> While your statement might be true for a PC sporting a CPU with a large
> L2
> >> cache, it can (read will) be quite detrimental on, say, a PS2 that
> doesn't
> >> have one.
> >
> >Well, you offer no alternative. As I have pointed out pointers simply
> don't
> >work for game entity cross references. I have a constant time scheme that
> >takes perhaps a dozen cycles, and it has exactly the same cache behaviour
> as
> >a pointers. Not much is going on today, so I will show a detailed
example.
>
>
> I merely meant to point out that on e.g. the PS2, hardly any memory
> accesses
> are "negligible". As it only has an 8K 2-way associative data cache, you
> really should go to great lengths to avoid ever reading outside the cache
> (by making use of the 16K scratchpad RAM, using tight data structures,
> etc).

What sort of games have you written? You have a 32 MB machine and you are
telling me to avoid EVER reading outside of 8K? Crazy talk.

> This drastically conflicts with programming a PC with a large L2-cache
> and has serious repercussions on how you design things.

Well, again, you have offered no alternatives other than "write you program
to fit in 8K of L1 cache". The lack of an L2 cache was one of the most
surprising things I noticed when I first review the psx2 docs. For example,
this is an email I sent to my team on march 15th:

<Looking at the psx2 docs, I noticed alot of discussion of avoiding L1 cache
<misses. It just occured to me, the psx2 has no L2 cache! This is really
<going to change the notion of what is reasonably fast. Consider for example
<that a P3 could be produced for about 40% of the current cost if it didn't
<have a L2 cache. The original celeron had no L2 cache and it was a
<performance disaster.

<I'm really starting to question if a huge C++ codebase like UT can ever run
<reasonably on the psx2.

So it isn't like I am unaware of the issues.

> I'm sure this is one reason why we initially heard lots of PC developers
> complain about how hard the PS2 was to program for.

Nope, not at all. We found it difficult because the sony sample code for the
graphics end of things is total garbage. The fastest graphics code, written
by someone whos full time job it is to write such code, was simply wrong, in
at least 3 ways. We figured it out thought, mostly. I still don't have good
mip mapping, but we are close. I must say the hardware is very very cool,
but sony has not provided the code needed to take advantage of it. Our
graphics code is essentially from scratch and is much much better than
anything sony provided, writing that is what was hard.

> I didn't mean to comment on the smart pointers vs. ID-to-pointer issue
> at all. Just to point out that on the PS2 any level of memory indirection
> can be quite expensive and should be avoided (if possible, of course).

Well, indirection is absolutely critical to algortihm design. I'm not
writing tetris here, I fully expect to have more than 5MB of source code.

> Of course, there will always be things such as time-memory trade-offs,
> but thinking about how to stay in cache should be the #1 thing in a
> PS2 programmer's mind.

So, you are saying "don't use any algorithm that goes outside the 8K L1
cache?"....nonsense. You seem to be saying that the ps2 is crippled by the
lack of an L2 cache....I don't see that. Just like they taught us in school,
algorithms are good or bad regardless of the hardware.

Sorry for the harshness.
-Gil

> I'm not really at liberty to discuss the PS2 hardware further here (due to
> NDAs and all that crap), but the above info is all available on the net,
> at e.g.
>
> http://www.anarchists.co.uk/html/isscc.html
>
> for anyone who is interested in more detail.
>
>
> Christer Ericson
> SCEA, Santa Monica
>
>
> _______________________________________________
> GDAlgorithms-list mailing list
> GDA...@li...
> http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list

Re: Circular references (was Re: [Algorithms] Reference counting hell... whatis a good paradigm.)

From: Neil S. <ne...@r0...> - 2000-12-29 04:46:21

> I merely meant to point out that on e.g. the PS2, hardly any memory
> accesses
> are "negligible". As it only has an 8K 2-way associative data cache, you
> really should go to great lengths to avoid ever reading outside the cache
> (by making use of the 16K scratchpad RAM, using tight data structures,
> etc).
>
> This drastically conflicts with programming a PC with a large L2-cache
> and has serious repercussions on how you design things.

Memory accesses are negligible when they exist in the 80% of code that takes
20% of the processing time. The important thing here is to determine where
most of the processing time lies, and for most game engines on most
machines, this will be in the rendering and physics code (and maybe AI
code). The use of smart pointers over ids is essentially academic as far as
processing overhead goes, so long as they are not used much past resource
management (ie. using them for *everything* is wasteful and overkill IMHO).
On consoles like the PS2, efficiency is mainly determined by the use of
hardware consurrency (this is also true on PCs, but to a lesser degree), and
more gain can be made from clever implementations of the rendering and
physics code than can ever be made by changing an entire engine design
(unless, of course, the engine design limits how the rendering/physics code
can be implemented).

> I'm sure this is one reason why we initially heard lots of PC developers
> complain about how hard the PS2 was to program for.

Possibly, but I suspect that many of those PC developers were having
problems rendering quickly (until they adopted a PS2-friendly approach) and
hadn't even begun to investigate perfromance problems with
smart-pointers/ids yet.

I dont think anyone that had worked on a console before was at all surprised
by the PS2. As with most consoles, it is reasonably easy to get things
working, but far more difficult to use the hardware to its full potential.
Used properly, it is a very fast machine, but many PC engine architectures
are not suitable for simple porting, hence the problem.

> I didn't mean to comment on the smart pointers vs. ID-to-pointer issue
> at all. Just to point out that on the PS2 any level of memory indirection
> can be quite expensive and should be avoided (if possible, of course).

They key here is to determine where most memory accesses are being done. In
a typical engine, this will be in the rendering and physics code. Although
memory access is expensive *everywhere*, it has less impact where it is
executed fewer times, and there is no point optimising *all* code for PS2
(or any console for that matter) when most of the execution time is in a
small subset of the code.

Any game engine where this is not the case (with exceptions such as advanced
AI, etc.) will probably have problems on all machines, not just consoles,
and probably wants a redesign.

> Of course, there will always be things such as time-memory trade-offs,
> but thinking about how to stay in cache should be the #1 thing in a
> PS2 programmer's mind.

Well, the programmer that has to write the rendering and physics code, at
least... ;)

- Neil

Re: Circular references (was Re: [Algorithms] Reference counting hell... whatis a good paradigm.)

From: Pierre T. <p.t...@wa...> - 2000-12-29 22:13:00

BTW...

As I wrote before, one key advantage of IDs over pointers is that you can
move some resources in ram without invalidating the relations/links between
two objects. The way you reorganize the memory layout depends on your
runtime access patterns - as explained here once by Ville Miettinen IIRC, I
think the Umbra guys use "accessors" for this. (which sounds quite difficult
to avoid as soon as you use lazy evaluation all over the place)

Now, the whole point of the reorganization is to become cache-friendly, so I
guess this should be very efficient on PS2 according to your claims. And
since you just can't do it with pointers (at least not in such an automatic
way), IDs may finally well be the cache-friendly way :)

The little piece of code responsible for the ID-to-pointer translation is
not the problem IMO. This is exactly the same indirection (in my scheme) as
the one involved in virtual methods VS non-virtual ones. This is the same
overhead as well : virtually free on all modern computers. If it's not the
case on PS2, then I'm very happy not to program on PS2...

Pierre

----- Original Message -----
From: <Christer_Ericson@Playstation.sony.com>
To: <gda...@li...>
Sent: Wednesday, December 27, 2000 11:53 PM
Subject: Re: Circular references (was Re: [Algorithms] Reference counting
hell... whatis a good paradigm.)

>
>
> Gil Gribb wrote:
>
> >> Jim Offerman wrote:
> >> >The overhead of using smart pointers or id<->pointer mappings is
> becoming
> >> >negligiable these days ...
> >>
> >> Careful, not everyone on here targets (Intel-based) PCs.
> >>
> >> While your statement might be true for a PC sporting a CPU with a large
> L2
> >> cache, it can (read will) be quite detrimental on, say, a PS2 that
> doesn't
> >> have one.
> >
> >Well, you offer no alternative. As I have pointed out pointers simply
> don't
> >work for game entity cross references. I have a constant time scheme that
> >takes perhaps a dozen cycles, and it has exactly the same cache behaviour
> as
> >a pointers. Not much is going on today, so I will show a detailed
example.
>
>
> I merely meant to point out that on e.g. the PS2, hardly any memory
> accesses
> are "negligible". As it only has an 8K 2-way associative data cache, you
> really should go to great lengths to avoid ever reading outside the cache
> (by making use of the 16K scratchpad RAM, using tight data structures,
> etc).
>
> This drastically conflicts with programming a PC with a large L2-cache
> and has serious repercussions on how you design things.
>
> I'm sure this is one reason why we initially heard lots of PC developers
> complain about how hard the PS2 was to program for.
>
> I didn't mean to comment on the smart pointers vs. ID-to-pointer issue
> at all. Just to point out that on the PS2 any level of memory indirection
> can be quite expensive and should be avoided (if possible, of course).
>
> Of course, there will always be things such as time-memory trade-offs,
> but thinking about how to stay in cache should be the #1 thing in a
> PS2 programmer's mind.
>
>
> I'm not really at liberty to discuss the PS2 hardware further here (due to
> NDAs and all that crap), but the above info is all available on the net,
> at e.g.
>
> http://www.anarchists.co.uk/html/isscc.html
>
> for anyone who is interested in more detail.
>
>
> Christer Ericson
> SCEA, Santa Monica
>
>
> _______________________________________________
> GDAlgorithms-list mailing list
> GDA...@li...
> http://lists.sourceforge.net/mailman/listinfo/gdalgorithms-list

Re: Circular references (was Re: [Algorithms] Reference counting hell... whatis a good paradigm.)

From: Corrinne Y. <cor...@ho...> - 2000-12-29 22:44:59

----- Original Message -----
From: "Pierre Terdiman" <p.t...@wa...>
To: <gda...@li...>
Sent: Friday, December 29, 2000 4:08 PM
Subject: Re: Circular references (was Re: [Algorithms] Reference counting
hell... whatis a good paradigm.)


> As I wrote before, one key advantage of IDs over pointers is that you can
> move some resources in ram without invalidating the relations/links
between
> two objects. The way you reorganize the memory layout depends on your

-- Now that this discussion has slowed down a tad :)
-- Accessor type ID indirection (in my bit of experience here, there is
lumpsum data streaming in spec ops, but what I am programming here is much
more substantial streaming where frighteningly everything is potentially
virtual potentially streamable) is often very necessary for Streaming type
architecture, where you can't count on for example the collision data
structure of your engine already being instantiated.

-- Lazy Evaluation of ID data can and will cause horrible choke behaviors
and framerate inconsistency, memory reference time overhead being really
even the least of the problem in that case. You still need a smart Thread or
a smart Regular Call to pre-evaluate your to be accessed data and spread out
the time, the data access seeks, etc. You can't wait to need it then try to
get it; very inconsistent slowness will result, whatever your cache
hardware. (there probably are exceptions :) )

-- A great reference for learning how to code good regular "pre-accessing of
data" is all the literature out there for advanced garbage collection
techniques, though what we are doing is the opposite of garbage collection.
:)

-- I am taking a lot of (technical) risk here basing my arhitecture for full
streaming flexibility and virtual reference to everything. It is easier to
count on performance when every access is direct and behavior preset; though
virtual ID reference may not be as bad as everyone suggests. Good thing
about virtual reference architecture is you can always attempt to gain
perfomance by "trivializing" all the access functions into direct access;
going from direct access into an "interfaced" access on a finished product
is a lot harder. :)

Re: Circular references (was Re: [Algorithms] Reference counting hell... whatis a good paradigm.)

From: Pierre T. <p.t...@wa...> - 2000-12-30 02:48:15

> -- Accessor type ID indirection (in my bit of experience here, there is
> lumpsum data streaming in spec ops, but what I am programming here is much
> more substantial streaming where frighteningly everything is potentially
> virtual potentially streamable) is often very necessary for Streaming type
> architecture, where you can't count on for example the collision data
> structure of your engine already being instantiated.

Nowadays I think that kind of architecture is the only possible one :) We
now have quite large worlds to handle, a lot of data to deal with, and a lot
of things to do. More polygons, more data structures, more textures, more
sounds, etc. Even if you don't need it in a given project/game, it's always
nice to foresee your future needs, just not to re-code everything each time.

IMHO of course :) So let's stream!


> -- Lazy Evaluation of ID data can and will cause horrible choke behaviors
> and framerate inconsistency, memory reference time overhead being really
> even the least of the problem in that case. You still need a smart Thread
or
> a smart Regular Call to pre-evaluate your to be accessed data and spread
out
> the time, the data access seeks, etc. You can't wait to need it then try
to
> get it; very inconsistent slowness will result, whatever your cache
> hardware. (there probably are exceptions :) )

Very very right. That's what I recently "discovered" (it wasn't a big
surprise actually) in my current "engine". (whatever that word means...) Do
you have a magic-bullet solution to this? For the moment, I use a bounding
volume larger than the viewing frustum to pre-load / pre-initialize the
expected objects.

[You can extend the larger BV to the whole world to get back your
non-lazy-evaluated engine BTW, so it's always useful to use lazy-evaluation
in the first place IMO.]


> -- I am taking a lot of (technical) risk here basing my arhitecture for
full
> streaming flexibility and virtual reference to everything. It is easier to
> count on performance when every access is direct and behavior preset;
though
> virtual ID reference may not be as bad as everyone suggests. Good thing
> about virtual reference architecture is you can always attempt to gain
> perfomance by "trivializing" all the access functions into direct access;
> going from direct access into an "interfaced" access on a finished product
> is a lot harder. :)

Agreed :)


Pierre

Re: Circular references (was Re: [Algorithms] Reference counting hell... what is a good paradigm.)

From: Christer_Ericson@Playstation.sony.com - 2000-12-29 04:21:59

Gil Gribb wrote:
>Sorry for the harshness.

I'm taking this off the list for obvious reasons.

Christer Ericson
SCEA, Santa Monica

Re: Circular references (was Re: [Algorithms] Reference counting hell... what is a good paradigm.)

From: Gil G. <gg...@ra...> - 2000-12-27 16:17:52

> Jim Offerman wrote:
> >The overhead of using smart pointers or id<->pointer mappings is becoming
> >negligiable these days ...
>
> Careful, not everyone on here targets (Intel-based) PCs.
>
> While your statement might be true for a PC sporting a CPU with a large L2
> cache, it can (read will) be quite detrimental on, say, a PS2 that doesn't
> have one.

Well, you offer no alternative. As I have pointed out pointers simply don't
work for game entity cross references. I have a constant time scheme that
takes perhaps a dozen cycles, and it has exactly the same cache behaviour as
a pointers. Not much is going on today, so I will show a detailed example.

But first, the lack of a L2 cache does certainly affect choice of data
structures, but in general you can't avoid searching, even though it is
cache unfriendly. For example, a scene graph needs to be searched, and this
will definitely thrash the L1 cache, but it isn't like a linear search will
yeild better performance.

Another example is the "think loop". Many games, such as quakeX, look at
every entity, every game frame to see who needs to think. It is far more
efficient to use a priority queue to touch only those entities that need to
think. Using a priority queue is going to require some "random" memory
accesses, but it will be far faster than accessing every entity every frame.

----

In this example, an entity is a game actor, such as a player, projectile,
door, platform, pickup, effect, ect. I am writing the (simplified) example
in C, but I use C++ for a similar design for my ps2 engine.

Fast, cache friendly entity ID's:

typedef unsigned int EntityID;

struct Entity
{
    EntityID MyID;
    Entity *NextFreeEntity;

    // .....

    EntityID Target; // this is our sample entity cross reference
    // the target is the other entity this entity is attacking.
};

Entity EntityArray[1024];
Entity *FreeChainRoot;

void InitializeEntities()
{
    int i;
    for (i=0;i<1024;i++)
    {
        EntityArray[i].MyID = i + (1<<10);
        if (i<1023)
            EntityArray[i].NextFreeEntity=EntityArray+i+1;
        else
            EntityArray[i].NextFreeEntity=0;
    }
    FreeChainRoot=EntityArray;
}

Entity *NewEntity()
{
    Entity *ret=FreeChainRoot;
    if (ret)
        FreeChainRoot=ret->NextFreeEntity;
    return ret;
}

void DeleteEntity(Entity *who)
{
    who->MyID += 1<<10; // this should be checked for overflow, see note
below
    who->NextFreeEntity=FreeChainRoot;
    FreeChainRoot=who;
}

Entity *IDToPointer(EntityID id)
{
    Entity *ret=EntityArray + (id & 1023);
    if (id != ret->MyID)
        return 0;
    return ret;
}

// sample usage:

void Attack(Entity *me)
{
    Entity *who=IDToPointer(me->Target);
    if (who)
    {
        // valid target, proceed with attack
    }
    else
    {
        // target has died, acquire new target ect
    }
}

Assuming the code calling IDToPointer is going to immediately dereference
the pointer, (and why would it want the pointer without dereferencing it?)
the cache behaviour is identical to using pointers.

There are three ways to deal with ID's overflowing. You could consider 64
bit ids. It might be ok to reset it back to its intital value, since this id
hasn't been used in a very long time. But I think the best solution is to
add deleted entities onto the end of the free chain, thereby guaranteeing a
full 2^32 new entities before overflow occurs.

-Gil