Re: [Algorithms] Cache-coherent hashing for broad-phase collision detection
Brought to you by:
vexxed72
From: Jon W. <jw...@gm...> - 2010-03-23 00:42:59
|
std::set<> is not the right solution. If there are only every a few elements in each bucket, I suggest using a std::vector<> and just search it linearly. Each element is 8 bytes, so 8 elements fit in a single cache line. If you have buckets with zillions of items (a bad idea for other reasons) and need to delete from the middle, try a std::list<> with a custom allocator, or as others have said std::hash_map or similar. STL containers sometimes help with caching, because they often use custom, pool-based allocators. However, other times, they actually end up hurting, either because the pools are too small (4 item buckets in std::deque<>, say), or because of container implementation overhead (too many pointers non-intrusive implementation, etc). Sincerely, jw -- Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable. On Mon, Mar 22, 2010 at 4:16 PM, John Ratcliff <jra...@gm...>wrote: > You definitely do not want to ever be using a set or a map. The only > reason to use a set or a map is if you absolutely need to constantly > maintain something in sorted order; which is rarely ever the case. Most of > the time you just want to do a key/value lookup and for that a hash table is > extremely fast. Using the STL in performance sensitive code is also > questionable as a whole lot of memory allocations and copy constructors get > executed both of which you want to avoid. > > If you are going to use the STL, and then have a profile of many tiny > allocations, you should consider using a custom small memory allocator; > there are several implementations available. If you are dealing in a > heavily multi-threaded environment you need to take that into consideration > as well. Something like the Hoard memory allocator is not only designed to > handle many tiny allocations efficiently but also minimizes thread > contention. > > John > > On Mon, Mar 22, 2010 at 6:05 PM, Sebastian Sylvan < > seb...@gm...> wrote: > >> >> >> On Mon, Mar 22, 2010 at 9:49 PM, Darren Grant < >> dg...@ke...> wrote: >> >>> Hi, >>> >>> I am looking at ways of improving cache coherence for a typical >>> broad-phase collision detection scenario (cpu). The basic >>> implementation utilizes these data structures: >>> >>> struct ObjectTracking { >>> int ObjectIndex; >>> int RefCount; >>> }; >>> >>> typedef set<ObjectTracking> Tile; >>> >>> struct TilingLevel { >>> vector<Tile> Tiles; >>> int Pitch; >>> Tile& Map(int x, int y) { return Tiles[y*Pitch+x]; } >>> }; >>> >>> struct TilingHierarchy { >>> vector<TilingLevel> TilingLevels; >>> Tile& Map(int level, int x, int y) { return >>> TilingLevels[level].Map(x,y); } >>> }; >>> >>> >>> And the hot process involves frequent iteration of: >>> >>> for level from 0 to maxlevel >>> for y from y0(level) to y1(level) >>> for x from x0(level) to x1(level) >>> tilingHierarchy.Map(level,x,y).InsertOrRemove(); >>> >>> >>> Performance becomes an issue here when the std::set allocations get >>> spread out across process memory. I will therefore replace >>> individual sets with a single shared map to explore implementations >>> for coherence: >>> >>> (level,x,y,ObjectIndex)->(RefCount). >>> >>> >>> (level,x,y,ObjectIndex) has simple perfect hash functions since the >>> boundary conditions are known at start. Local memory access roughly >>> favours tiling levels > tiling level > tile row > tile according to >>> the given code, but I am wary of less obvious yet still important >>> observations that may be missed. >>> >>> >>> Does this sound about right? Are there any better options I have >>> flown right past? >>> >>> Thank you. >> >> >> Try a single hash table (multimap) mapping from the cell position to >> objects in that cell position? You could try open adressing to get rid of >> pointers altogether and have just a single flat array at the bottom of your >> data structure, although that has limitations (difficulty deleting, less >> than gracious performance degradation as the load factor grows). >> >> -- >> Sebastian Sylvan >> >> >> ------------------------------------------------------------------------------ >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |