Thread: Re: [Algorithms] Cache-oblivious layout (was: BSP tree, or what?) (Page 2)
Brought to you by:
vexxed72
|
From: Willem H. de B. <wi...@wh...> - 2008-06-05 12:56:47
|
----- Original Message ----- From: "Marc B. Reynolds" <mar...@or...> > > Around 2001 (for a previous employer) I did some toy experiments ordering > large octtree memory via 3D-Hilbert curve indices and was seeing > worthwhile > performance gains. That sounds interesting. Could you expand on this a bit? In what order are the nodes stored this way? What was the performance gain, in what kind of situations did it behave well/badly ? Cheers, Willem |
|
From: Marc B. R. <mar...@or...> - 2008-06-05 16:40:10
|
My original testing was a very simple "proof of concept" and the results were "Promising...needs further investigation"...which never happened, so take this info for what it's worth. > That sounds interesting. Could you expand on this a bit? Originally I used this technique in 2D for paging in data for grids and quadtrees. I got the idea from [1]. Started thinking about a 3D version after reading [2]. > In what order are the nodes stored this way? What I did in my test was to take the 3D coordinate of a node, convert it to a Hilbert index and used the (sorted) Hilbert-indices to layout the memory. There are probably MUCH better methods. Conceptually, you layout memory as a Hilbert curve walk of the space would 'touch' the node in question. Well, actually there is no reason to restrict to Hilbert...let's say some space-filling curve. I'm thinking that, say a Morton curve might be better for dynamic trees. > What was the performance gain, in what kind of situations did it > behave well/badly ? This is all from memory, so I can't give any solid data. The test was as follows: * Forest of octrees (I believe they were 'region' octrees). By forest I mean an 'n' by 'n' grid. * The trees were static, but contained moving objects in addition to static objects. And my timings were "visit all nodes along this line within radius 'R'", and the Hilbert layout performed consistent better, except in the trivial case (line is roughly a linear memory walk). The gains were modest, but this was on something like (guessing here) a Pentium 3. Most of the papers I was referencing at the time came from nearest and approx-nearest neighbors research. ----- [1] "Fractals for Secondary Key Retrieval" http://www.intelligence.tuc.gr/~petrakis/courses/multimedia/papers/fractals. pdf [2] "Linear Clustering of Object with Multiple Attributes" http://www.cs.cmu.edu/afs/cs.cmu.edu/user/christos/www/courses/826-resources /PAPERS+BOOK/jagadish-sigmod-90-P332.pdf |
|
From: Darren G. <dg...@ke...> - 2008-06-05 21:02:49
|
At 09:40 AM 6/5/2008, Marc B. Reynolds wrote: > > In what order are the nodes stored this way? > >What I did in my test was to take the 3D coordinate of a node, convert it to >a Hilbert index and used the (sorted) Hilbert-indices to layout the memory. >There are probably MUCH better methods. > >Conceptually, you layout memory as a Hilbert curve walk of the space would >'touch' the node in question. Well, actually there is no reason to restrict >to Hilbert...let's say some space-filling curve. I'm thinking that, say a >Morton curve might be better for dynamic trees. Great experiment! I would love to know if the z-order curve just works better across the board with the octrees. Regards, Darren |
|
From: Tom F. <tom...@ee...> - 2008-06-05 23:07:51
|
After experience with Morton vs Hilbert in a bunch of areas (rasterization, texture storage, vertex caching, etc), my gut feeling is that a Hilbert curve very rarely gives a worse cache-traversal pattern than Morton. Note the very precise statement! However, it can be a pain to implement (whereas Morton is pretty trivial), and whether it's actually a significant *win* depends a lot on a bazillion things. Fairly obviously for very sparse data, there's no coherence to be extracted, and neither is significantly better than completely random, in which case you should implement the simplest thing possible (usually linear). TomF. > -----Original Message----- > From: gda...@li... > [mailto:gda...@li...] On Behalf Of > Darren Grant > Sent: Thursday, June 05, 2008 2:03 PM > To: Game Development Algorithms; 'Game Development Algorithms' > Subject: Re: [Algorithms] Cache-oblivious layout (was: BSP tree, or > what?) > > At 09:40 AM 6/5/2008, Marc B. Reynolds wrote: > > > > > In what order are the nodes stored this way? > > > >What I did in my test was to take the 3D coordinate of a node, convert > it to > >a Hilbert index and used the (sorted) Hilbert-indices to layout the > memory. > >There are probably MUCH better methods. > > > >Conceptually, you layout memory as a Hilbert curve walk of the space > would > >'touch' the node in question. Well, actually there is no reason to > restrict > >to Hilbert...let's say some space-filling curve. I'm thinking that, > say a > >Morton curve might be better for dynamic trees. > > > Great experiment! > > I would love to know if the z-order curve just works better across > the board with the octrees. > > > Regards, > Darren > > > ----------------------------------------------------------------------- > -- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms- > list |
|
From: Marc B. R. <mar...@or...> - 2008-06-06 12:55:46
|
Out of curiosity I did a quick web search looking for recent papers with some performance numbers, There seem to be a few, here's an example: "SIMD Packet Techniques for Photon Mapping", 2007 http://www.cs.ucla.edu/~pfal/papers/RTR-07-simd.pdf > Fairly obviously for very sparse data, there's no coherence to be > extracted, and neither is significantly better than completely random, > in which case you should implement the simplest thing possible > (usually linear). Humm..not sure about this, but I spent a whole 2 minutes thinking about it. |
|
From: Daniel V. <ma...@ma...> - 2008-06-06 15:26:16
|
On Jun 6, 2008, at 01:07 , Tom Forsyth wrote: > After experience with Morton vs Hilbert in a bunch of areas > (rasterization, > texture storage, vertex caching, etc), my gut feeling is that a > Hilbert > curve very rarely gives a worse cache-traversal pattern than Morton. > Note > the very precise statement! However, it can be a pain to implement > (whereas > Morton is pretty trivial), and whether it's actually a significant > *win* > depends a lot on a bazillion things. I'd agree with Tom's gut impression. :) For a wavelet image compression algorithm (shameless plug: http://www.maven.de/code/wavelet/) I went with Morton / Z first, and then later on went ahead and implemented a Hilbert curve variant. The Hilbert curve gave slightly, but consistently better results (i.e. smaller images) which means it extracted locality better than the other space-filling curves, and as I stored the curve in a table anyhow the computational cost was irrelevant. Of course, there were also cases where the Morton curve fared better, but they were very much in the minority (and from and Information Theory viewpoint they *have* to exist). The above-linked code has a nice, self-contained 2D Hilbert Curve generation (which gets optimized to a simple loop due to tail- recursion), which is quite easy to extend to 3D (as was done in the paper referenced in the source). Daniel. |
|
From: Mark D. <duc...@ll...> - 2008-06-06 17:40:54
|
My colleagues Sung-Eui Yoon and Peter Lindstrom did a very thorough study of various cache-efficiency metrics for a variety of space-filling variants in a paper "Mesh Layouts for Block-Based Caches" from IEEE Vis 2006, pdf here: http://www.cc.gatech.edu/~lindstro/papers/blockcache/paper.pdf Looks like some special variants on Hilbert and a block interpretation of Sierpinski do very well, depending on the way you define efficiency (will vary by application requirements a bit, in other words). Cheers, --Mark D. Daniel Vollmer wrote: > On Jun 6, 2008, at 01:07 , Tom Forsyth wrote: > > >>After experience with Morton vs Hilbert in a bunch of areas >>(rasterization, >>texture storage, vertex caching, etc), my gut feeling is that a >>Hilbert >>curve very rarely gives a worse cache-traversal pattern than Morton. >>Note >>the very precise statement! However, it can be a pain to implement >>(whereas >>Morton is pretty trivial), and whether it's actually a significant >>*win* >>depends a lot on a bazillion things. > > > I'd agree with Tom's gut impression. :) > > For a wavelet image compression algorithm (shameless plug: http://www.maven.de/code/wavelet/) > I went with Morton / Z first, and then later on went ahead and > implemented a Hilbert curve variant. The Hilbert curve gave slightly, > but consistently better results (i.e. smaller images) which means it > extracted locality better than the other space-filling curves, and as > I stored the curve in a table anyhow the computational cost was > irrelevant. Of course, there were also cases where the Morton curve > fared better, but they were very much in the minority (and from and > Information Theory viewpoint they *have* to exist). > The above-linked code has a nice, self-contained 2D Hilbert Curve > generation (which gets optimized to a simple loop due to tail- > recursion), which is quite easy to extend to 3D (as was done in the > paper referenced in the source). > > > Daniel. > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |
|
From: Jeff R. <je...@gm...> - 2008-06-06 19:03:28
|
All very interesting, Does anyone have a link or example of how to quickly do a morton or hilbert lookup? -Jeff On Fri, Jun 6, 2008 at 12:40 PM, Mark Duchaineau <duc...@ll...> wrote: > My colleagues Sung-Eui Yoon and Peter Lindstrom did a very > thorough study of various cache-efficiency metrics for > a variety of space-filling variants in a paper > "Mesh Layouts for Block-Based Caches" from IEEE Vis 2006, > pdf here: > > http://www.cc.gatech.edu/~lindstro/papers/blockcache/paper.pdf > > Looks like some special variants on Hilbert and a block > interpretation of Sierpinski do very well, depending on the > way you define efficiency (will vary by application requirements > a bit, in other words). > > Cheers, > > --Mark D. > > Daniel Vollmer wrote: >> On Jun 6, 2008, at 01:07 , Tom Forsyth wrote: >> >> >>>After experience with Morton vs Hilbert in a bunch of areas >>>(rasterization, >>>texture storage, vertex caching, etc), my gut feeling is that a >>>Hilbert >>>curve very rarely gives a worse cache-traversal pattern than Morton. >>>Note >>>the very precise statement! However, it can be a pain to implement >>>(whereas >>>Morton is pretty trivial), and whether it's actually a significant >>>*win* >>>depends a lot on a bazillion things. >> >> >> I'd agree with Tom's gut impression. :) >> >> For a wavelet image compression algorithm (shameless plug: http://www.maven.de/code/wavelet/) >> I went with Morton / Z first, and then later on went ahead and >> implemented a Hilbert curve variant. The Hilbert curve gave slightly, >> but consistently better results (i.e. smaller images) which means it >> extracted locality better than the other space-filling curves, and as >> I stored the curve in a table anyhow the computational cost was >> irrelevant. Of course, there were also cases where the Morton curve >> fared better, but they were very much in the minority (and from and >> Information Theory viewpoint they *have* to exist). >> The above-linked code has a nice, self-contained 2D Hilbert Curve >> generation (which gets optimized to a simple loop due to tail- >> recursion), which is quite easy to extend to 3D (as was done in the >> paper referenced in the source). >> >> >> Daniel. >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |
|
From: Tom F. <tom...@ee...> - 2008-06-06 21:15:34
|
Morton order is otherwise known as "Z-swizzle". You take the coordinates and interleave the bits. So if you have standard X and Y coordinates with bits x31,x30,x29,...,x2,x1,x0 and y31,y30,y29,...,y2,y1,y0 then the Morton coordinate is simply y31,x31,y30,x30,y29,x29,...,y2,x2,y1,x1,y0,x0. There's an obvious extension to higher dimensions. http://en.wikipedia.org/wiki/Z-order_(curve) There's also a bunch of different variations that don't interleave all the bits, just some of them, or interleave blocks of 2 or 4 bits at a time, rather than a fine 1:1 interleave (e.g. ...,y7,y6,y5,y4,x7,x6,x5,x4,y3,y2,y1,y0,x3,x2,x1,x0). They all trade lower computational cost for slightly lower efficiency - pretty easy to experiment with. If I ever knew the trick to generating Hilbert ordering easily, I've forgotten it :-( TomF. > -----Original Message----- > From: gda...@li... > [mailto:gda...@li...] On Behalf Of > Jeff Russell > Sent: Friday, June 06, 2008 12:03 PM > To: duc...@ll...; Game Development Algorithms > Subject: Re: [Algorithms] Cache-oblivious layout (was: BSP tree, or > what?) > > All very interesting, > > Does anyone have a link or example of how to quickly do a morton or > hilbert lookup? > > -Jeff > > On Fri, Jun 6, 2008 at 12:40 PM, Mark Duchaineau <duc...@ll...> > wrote: > > My colleagues Sung-Eui Yoon and Peter Lindstrom did a very > > thorough study of various cache-efficiency metrics for > > a variety of space-filling variants in a paper > > "Mesh Layouts for Block-Based Caches" from IEEE Vis 2006, > > pdf here: > > > > http://www.cc.gatech.edu/~lindstro/papers/blockcache/paper.pdf > > > > Looks like some special variants on Hilbert and a block > > interpretation of Sierpinski do very well, depending on the > > way you define efficiency (will vary by application requirements > > a bit, in other words). > > > > Cheers, > > > > --Mark D. > > > > Daniel Vollmer wrote: > >> On Jun 6, 2008, at 01:07 , Tom Forsyth wrote: > >> > >> > >>>After experience with Morton vs Hilbert in a bunch of areas > >>>(rasterization, > >>>texture storage, vertex caching, etc), my gut feeling is that a > >>>Hilbert > >>>curve very rarely gives a worse cache-traversal pattern than Morton. > >>>Note > >>>the very precise statement! However, it can be a pain to implement > >>>(whereas > >>>Morton is pretty trivial), and whether it's actually a significant > >>>*win* > >>>depends a lot on a bazillion things. > >> > >> > >> I'd agree with Tom's gut impression. :) > >> > >> For a wavelet image compression algorithm (shameless plug: > http://www.maven.de/code/wavelet/) > >> I went with Morton / Z first, and then later on went ahead and > >> implemented a Hilbert curve variant. The Hilbert curve gave > slightly, > >> but consistently better results (i.e. smaller images) which means it > >> extracted locality better than the other space-filling curves, and > as > >> I stored the curve in a table anyhow the computational cost was > >> irrelevant. Of course, there were also cases where the Morton curve > >> fared better, but they were very much in the minority (and from and > >> Information Theory viewpoint they *have* to exist). > >> The above-linked code has a nice, self-contained 2D Hilbert Curve > >> generation (which gets optimized to a simple loop due to tail- > >> recursion), which is quite easy to extend to 3D (as was done in the > >> paper referenced in the source). > >> > >> > >> Daniel. > >> > >> -------------------------------------------------------------------- > ----- > >> Check out the new SourceForge.net Marketplace. > >> It's the best place to buy or sell services for > >> just about anything Open Source. > >> http://sourceforge.net/services/buy/index.php > >> _______________________________________________ > >> GDAlgorithms-list mailing list > >> GDA...@li... > >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > >> Archives: > >> > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms- > list > >> > > > > --------------------------------------------------------------------- > ---- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > _______________________________________________ > > GDAlgorithms-list mailing list > > GDA...@li... > > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > > Archives: > > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms- > list > > > > ----------------------------------------------------------------------- > -- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms- > list |
|
From: Mark D. <duc...@ll...> - 2008-06-06 21:32:48
|
Typically the other space-fill curves are computed efficiently as you traverse your hierarchy, i.e. simple arithmetic as you move down different branches of a tree. If you are randomly accessing elements, then some caching and look-up of the indices is usually used to speed this up. This overhead is okay if you have fairly large payloads at each hierarchy node, but is a problem for really light-weight nodes. For swizzles on texture hardware, this is obviously a non-starter. Cheers, --Mark D. Tom Forsyth wrote: > Morton order is otherwise known as "Z-swizzle". You take the coordinates and > interleave the bits. So if you have standard X and Y coordinates with bits > x31,x30,x29,...,x2,x1,x0 and y31,y30,y29,...,y2,y1,y0 then the Morton > coordinate is simply y31,x31,y30,x30,y29,x29,...,y2,x2,y1,x1,y0,x0. There's > an obvious extension to higher dimensions. > > http://en.wikipedia.org/wiki/Z-order_(curve) > > There's also a bunch of different variations that don't interleave all the > bits, just some of them, or interleave blocks of 2 or 4 bits at a time, > rather than a fine 1:1 interleave (e.g. > ...,y7,y6,y5,y4,x7,x6,x5,x4,y3,y2,y1,y0,x3,x2,x1,x0). They all trade lower > computational cost for slightly lower efficiency - pretty easy to experiment > with. > > If I ever knew the trick to generating Hilbert ordering easily, I've > forgotten it :-( > > TomF. |
|
From: <gd...@er...> - 2008-06-06 19:41:31
|
> My colleagues Sung-Eui Yoon and Peter Lindstrom did a very > thorough study of various cache-efficiency metrics for Nice coincidence, I met up with Sung-Eui Yoon in Seoul a few weeks ago, and discussed his cache oblivious methods in the context of tree traversals. To quote part of our conversation: "In fact, you can think that cache-oblivious layout is a kind of hybrid between breadth-first and depth-first layout. Its key idea is to find a cluster of nodes that are likely to access together and store them in one cache block; this method is optimized for a particular cache block size so this is a cache-aware layout. For cache-oblivious layouts, we simply recursively perform clustering for each computed cluster and order those clusters in order to make them works well with geometrically increasing cache block sizes." For the AABB tree implementation in Bullet, I chose to optimize the depth-first traversal with chunks that fit in SPU memory (around 32kb), because for collision detection purposes the order to visit children doesn't matter much. Each chunk (sub-tree) automatically contains triangles that are closeby, so if you render them in different colors, you get a nice looking patchwork. Hope this helps, Erwin Mark Duchaineau writes: > My colleagues Sung-Eui Yoon and Peter Lindstrom did a very > thorough study of various cache-efficiency metrics for > a variety of space-filling variants in a paper > "Mesh Layouts for Block-Based Caches" from IEEE Vis 2006, > pdf here: > > http://www.cc.gatech.edu/~lindstro/papers/blockcache/paper.pdf > > Looks like some special variants on Hilbert and a block > interpretation of Sierpinski do very well, depending on the > way you define efficiency (will vary by application requirements > a bit, in other words). > > Cheers, > > --Mark D. > > Daniel Vollmer wrote: >> On Jun 6, 2008, at 01:07 , Tom Forsyth wrote: >> >> >>>After experience with Morton vs Hilbert in a bunch of areas >>>(rasterization, >>>texture storage, vertex caching, etc), my gut feeling is that a >>>Hilbert >>>curve very rarely gives a worse cache-traversal pattern than Morton. >>>Note >>>the very precise statement! However, it can be a pain to implement >>>(whereas >>>Morton is pretty trivial), and whether it's actually a significant >>>*win* >>>depends a lot on a bazillion things. >> >> >> I'd agree with Tom's gut impression. :) >> >> For a wavelet image compression algorithm (shameless plug: http://www.maven.de/code/wavelet/) >> I went with Morton / Z first, and then later on went ahead and >> implemented a Hilbert curve variant. The Hilbert curve gave slightly, >> but consistently better results (i.e. smaller images) which means it >> extracted locality better than the other space-filling curves, and as >> I stored the curve in a table anyhow the computational cost was >> irrelevant. Of course, there were also cases where the Morton curve >> fared better, but they were very much in the minority (and from and >> Information Theory viewpoint they *have* to exist). >> The above-linked code has a nice, self-contained 2D Hilbert Curve >> generation (which gets optimized to a simple loop due to tail- >> recursion), which is quite easy to extend to 3D (as was done in the >> paper referenced in the source). >> >> >> Daniel. >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list |
|
From: Pierre T. <pie...@gr...> - 2008-06-04 13:47:13
|
> Anyway, I'm interested in anyone's practical experiences here. As usual, > I suspect it's a case of "it all depends", but I'd love to know what > people have found useful and useless. I use AABB trees for everything. A static AABB-tree for static stuff, a dynamic AABB-tree for dynamic stuff (*). Simple, versatile, fast, to-the-point. - Pierre (*) double-buffered, one gets "refit" for N frames while another one gets rebuilt in parallel over N frames. After N frames, switch them and start again. |
|
From: Eric H. <eri...@gm...> - 2008-06-04 13:57:48
|
I like the idea of rebuilding in parallel over N frames, but how does that really work? If stuff is moving, do you sort of make a copy of the boxes' current locations and build off of that, or...? Also, how do you form your trees quickly and so they're efficient? I'm finding simple median splits stink. Eric On Wed, Jun 4, 2008 at 9:47 AM, Pierre Terdiman <pie...@gr...> wrote: > > Anyway, I'm interested in anyone's practical experiences here. As usual, > > I suspect it's a case of "it all depends", but I'd love to know what > > people have found useful and useless. > > I use AABB trees for everything. A static AABB-tree for static stuff, a > dynamic AABB-tree for dynamic stuff (*). > > Simple, versatile, fast, to-the-point. > > - Pierre > > (*) double-buffered, one gets "refit" for N frames while another one gets > rebuilt in parallel over N frames. After N frames, switch them and start > again. > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |