Thread: [Algorithms] How to get 3dvector largest coordinate index?
Brought to you by:
vexxed72
From: Sylvain G. V. <vi...@ii...> - 2009-02-27 03:41:22
|
Hi, I need to compute the index (not the actual value) of the largest coordinate of a normal, for some space hashing. I'm not sure how fast you guys usually find this index, but I've just created the following trick which I think is quite fast: > inline uint LargestCoordinate(const Vector3d &v) > { > const float x = fabs(v.x); > const float z = fabs(v.z); > const float y = Maths::max( fabs(v.y), z ); > return uint(fabs(y)>fabs(x)) << uint(fabs(z)>=fabs(y)); > } I didn't need such function before, so I'm not sure if this is considered fast or slow. Do you guys have something faster? |
From: Jon W. <jw...@gm...> - 2009-02-27 05:13:50
|
Sylvain G. Vignaud wrote: > I didn't need such function before, so I'm not sure if this is > considered fast or slow. Do you guys have something faster? > What hardware are you running on? Does it have conditional moves? Is it in-order or out-of-order? Does it have memory pre-fetch? In general, I would guess that there just won't be a measurable difference compared to the time spent waiting for the normal to arrive out of DRAM. My code to do the same thing does: float x = fabsf(n.x); float y = fabsf(n.y); float z = fabsf(n.z); return (x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2; I've never found this to show up in any profile, and on x86, I believe a competent compiler generates cmov. I guess you could even do it with ints where you mask away the top bit, if I knew that type aliasing wouldn't kill me. Sincerely, jw |
From: Alen L. <ale...@cr...> - 2009-02-27 07:29:43
|
Jon wrote: > I've never found this to show up in any profile, and on x86, I believe a > competent compiler generates cmov. I got burned with this some time ago, so I'll throw in a warning: A competent compiler (e.g. icc) does indeed, but a very common one (msvc) does not (at least not in VC8.0, which was the latest I tested for that). The compiler team says that it does in some cases, but I haven't been able to make it do so. Ever. It certainly doesn't do so for x<y?x:y or for x<y?0:1. If someone does know how to do that, please let me know. Btw, I've actually had cases where the lack of cmov did show on the profile (BIH raycaster), and manually rewritten code using SSE (one float per register!) was significantly faster, so cache performance was not an issue there. Interestingly enough, in many cases like this, when I try to recompile with icc, it turns out its code is very similar to my hand-optimized one. And to add one more idea for the OP, if you feel confident in the normality of the inputs, you can try using float-as-integer comparing tricks, but the speed is better only if you don't generate the input using float operations just before doing the compares (i.e. the input is already in memory). HTH, Alen |
From: Sylvain G. V. <vi...@ii...> - 2009-02-27 16:28:38
|
From: Jon Watte <jw...@gm...> > Sylvain G. Vignaud wrote: > > I didn't need such function before, so I'm not sure if this is > > considered fast or slow. Do you guys have something faster? > > > > What hardware are you running on? Does it have conditional moves? > Is it > in-order or out-of-order? Does it have memory pre-fetch? PC, current gen. It's for a real time pseuso global illumination attempt. I currently need a 10 second preprocess step to compute how vertexes "see" each other on the Sponza Atrium or the Sibenik Cathedrals from http://hdri.cgtechniques.com/~sponza/ Which means I can have dynamic lights, but my models are to be static. I'd like to see how much I could speed that up, even if dynamic meshes seem quite far away. |
From: Stefan S. <kef...@gm...> - 2009-02-27 16:48:22
|
By all means, elaborate on what you're doing, if you don't mind... :) Sylvain G. Vignaud wrote: > From: Jon Watte <jw...@gm...> > >> Sylvain G. Vignaud wrote: >> >>> I didn't need such function before, so I'm not sure if this is >>> considered fast or slow. Do you guys have something faster? >>> >>> >> What hardware are you running on? Does it have conditional moves? >> Is it >> in-order or out-of-order? Does it have memory pre-fetch? >> > > PC, current gen. It's for a real time pseuso global illumination > attempt. I currently need a 10 second preprocess step to compute how > vertexes "see" each other on the Sponza Atrium or the Sibenik Cathedrals > from http://hdri.cgtechniques.com/~sponza/ > Which means I can have dynamic lights, but my models are to be static. > I'd like to see how much I could speed that up, even if dynamic meshes > seem quite far away. > > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA > -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise > -Strategies to boost innovation and cut costs with open source participation > -Receive a $600 discount off the registration fee with the source code: SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > > |
From: Matt J <mjo...@gm...> - 2009-02-27 16:49:19
|
That's a good trick, something like this? unsigned int x = 0x7FFFFFFF & *(unsigned int *)&n.x; unsigned int y = 0x7FFFFFFF & *(unsigned int *)&n.y; unsigned int z = 0x7FFFFFFF & *(unsigned int *)&n.z; return *(float *)((x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2); Actually, that could technically pipeline well because your mixing ints and floats. This int trick won't work with NaN's, though, so you might want to add a debug assertion to see if the float uses NaNs to be more robust Finally, compilers don't need cmov to perform a conditional move optimization, see: http://www.nynaeve.net/?p=178 You should check the asm output to see what your optimizing compiler does.. also are you targeting it with the right instruction set (conditional moves began with Pentium Pro i think) > > I've never found this to show up in any profile, and on x86, I believe a > competent compiler generates cmov. > I guess you could even do it with ints where you mask away the top bit, > if I knew that type aliasing wouldn't kill me. > > Sincerely, > > jw > |
From: Matt J <mjo...@gm...> - 2009-02-27 16:52:20
|
oops, ignore the *(float *) part > That's a good trick, something like this? > > unsigned int x = 0x7FFFFFFF & *(unsigned int *)&n.x; > unsigned int y = 0x7FFFFFFF & *(unsigned int *)&n.y; unsigned int z > = 0x7FFFFFFF & *(unsigned int *)&n.z; > > return *(float *)((x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2); > > Actually, that could technically pipeline well because your mixing ints and > floats. This int trick won't work with NaN's, though, so you might want to > add a debug assertion to see if the float uses NaNs to be more robust > > Finally, compilers don't need cmov to perform a conditional move > optimization, see: http://www.nynaeve.net/?p=178 > > You should check the asm output to see what your optimizing compiler does.. > also are you targeting it with the right instruction set (conditional moves > began with Pentium Pro i think) > > >> >> I've never found this to show up in any profile, and on x86, I believe a >> competent compiler generates cmov. >> I guess you could even do it with ints where you mask away the top bit, >> if I knew that type aliasing wouldn't kill me. >> >> Sincerely, >> >> jw >> > -- ----- Matt Johnson http://otowngraphics.blogspot.com |
From: Sylvain G. V. <vi...@ii...> - 2009-02-28 06:43:42
|
From: Stefan Sandberg <kef...@gm...> > By all means, elaborate on what you're doing, if you don't mind... :) Well I was simply trying to get rid of as many computation as possible in my just-for-fun home-project of real-time global illumination approximation. I'm computing that on the CPU, per vertex. I'm of course using an octree to find out which vertexes "see" each other (then computing GI is simply computing vertex colors and propagating them to connected vertexes). The idea behind my previous question was to split each octree cells depending on the normal of the vertexes. Actually I'm now using a very simplified way to do so: > RoundingOf(1+normal.x) + (normal.z<0.0f ? 3 : 0) I don't split my octree cells depending on the normal.y for simplicity. Now I'm down to 1.53 seconds for the Sponza Atrium, and 3.16 seconds for the Sibenik Cathedral - models from Marko Dabrovic available at: http://hdri.cgtechniques.com/~sponza/ (these models are not very clean: lots of duplicated vertexes, not shared edges, holed, inverted faces creating holes...) My project is still in a very early stage, so the results are not that impressive yet ; with only one light in the scene: faked-GI: http://tfpsly.free.fr/Files/SiCath_gi.jpg Classical diffuse+ambient: http://tfpsly.free.fr/Files/SiCath_ambient.jpg Only the diffuse: http://tfpsly.free.fr/Files/SiCath_noambient.jpg > Sylvain G. Vignaud wrote: > > From: Jon Watte <jw...@gm...> > > > >> Sylvain G. Vignaud wrote: > >> > >>> I didn't need such function before, so I'm not sure if this is > >>> considered fast or slow. Do you guys have something faster? > >>> > >>> > >> What hardware are you running on? Does it have conditional > moves? > >> Is it > >> in-order or out-of-order? Does it have memory pre-fetch? > >> > > > > PC, current gen. It's for a real time pseuso global illumination > > attempt. I currently need a 10 second preprocess step to compute how > > vertexes "see" each other on the Sponza Atrium or the Sibenik > > Cathedrals from http://hdri.cgtechniques.com/~sponza/ > > Which means I can have dynamic lights, but my models are to be > > static. I'd like to see how much I could speed that up, even if > > dynamic meshes seem quite far away. } |
From: Matt J <mjo...@gm...> - 2009-02-27 16:03:04
|
Hmm, I fail to see how this works: If (y > x) == 0 then that means x <= y. So, If x <= y and z < y then 0 << 0 = 0 If x <= y and z >= y then 0 << 1 = 0 In the first case, y is the greatest component In the second case, z is the greatest component. Both are encoded as 0. Something I'm missing? Hi, > > I need to compute the index (not the actual value) of the largest > coordinate of a normal, for some space hashing. > > I'm not sure how fast you guys usually find this index, but I've just > created the following trick which I think is quite fast: > > > inline uint LargestCoordinate(const Vector3d &v) > > { > > const float x = fabs(v.x); > > const float z = fabs(v.z); > > const float y = Maths::max( fabs(v.y), z ); > > return uint(fabs(y)>fabs(x)) << uint(fabs(z)>=fabs(y)); > > } > |
From: Richard F. <ra...@gm...> - 2009-02-27 21:56:58
|
If (y > x) == 0 then that means x <= y. nope: If (y > x) == 0 then that means x >= y. 2009/2/27 Matt J <mjo...@gm...> > > Hmm, I fail to see how this works: > > If (y > x) == 0 then that means x <= y. > > So, > > If x <= y and z < y then 0 << 0 = 0 > If x <= y and z >= y then 0 << 1 = 0 > > In the first case, y is the greatest component In the second case, z is the > greatest component. > > Both are encoded as 0. Something I'm missing? > > > Hi, >> >> I need to compute the index (not the actual value) of the largest >> coordinate of a normal, for some space hashing. >> >> I'm not sure how fast you guys usually find this index, but I've just >> created the following trick which I think is quite fast: >> >> > inline uint LargestCoordinate(const Vector3d &v) >> > { >> > const float x = fabs(v.x); >> > const float z = fabs(v.z); >> > const float y = Maths::max( fabs(v.y), z ); >> > return uint(fabs(y)>fabs(x)) << uint(fabs(z)>=fabs(y)); >> > } >> > > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, > CA > -OSBC tackles the biggest issue in open source: Open Sourcing the > Enterprise > -Strategies to boost innovation and cut costs with open source > participation > -Receive a $600 discount off the registration fee with the source code: > SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- fabs(); Just because the world is full of people that think just like you, doesn't mean the other ones can't be right. |
From: Matt J <mjo...@gm...> - 2009-02-28 02:32:35
|
Yeah, okay, i get it, i got it backwards. I just said that =) Anyway, here another way that only has two conditionals, no shifts, and no max's and easier to follow. Not sure if it is faster though. The FP uses a stack either way, right? I think this will work but never tested const float f[3] = { fabsf(n.x), fabsf(n.y), fabsf(n.z) }; unsigned int i = ( f[1] > f[0] ) ? 1 : 0; return (f[2] > f[i]) ? 2 : i; ..........Finally, an all integer way: const unsigned int u[3] = { 0x7FFFFFFF & *(unsigned int *)&n.x, 0x7FFFFFFF & *(unsigned int *)&n.y, 0x7FFFFFFF & *(unsigned int *)&n.z }; unsigned int i = ( u[1] > u[0] ) ? 1 : 0; return (u[2] > u[i]) ? 2 : i; If (y > x) == 0 then that means x <= y. > nope: > If (y > x) == 0 then that means x >= y. > > 2009/2/27 Matt J <mjo...@gm...> > >> >> Hmm, I fail to see how this works: >> >> If (y > x) == 0 then that means x <= y. >> >> So, >> >> If x <= y and z < y then 0 << 0 = 0 >> If x <= y and z >= y then 0 << 1 = 0 >> >> In the first case, y is the greatest component In the second case, z is >> the greatest component. >> >> Both are encoded as 0. Something I'm missing? >> >> >> Hi, >>> >>> I need to compute the index (not the actual value) of the largest >>> coordinate of a normal, for some space hashing. >>> >>> I'm not sure how fast you guys usually find this index, but I've just >>> created the following trick which I think is quite fast: >>> >>> > inline uint LargestCoordinate(const Vector3d &v) >>> > { >>> > const float x = fabs(v.x); >>> > const float z = fabs(v.z); >>> > const float y = Maths::max( fabs(v.y), z ); >>> > return uint(fabs(y)>fabs(x)) << uint(fabs(z)>=fabs(y)); >>> > } >>> >> |
From: Sylvain G. V. <vi...@ii...> - 2009-02-27 16:23:29
|
Test is: uint(y>x) << uint(z>=y) - don't know why I let all these useless fabs in the previous email. If largest is: * X: 0 << ? = 0 as y<x (and we don't care about the second test) * Y: 1 << 0 = 1 as y>x and z<y * Z: 1 << 1 = 2 as y is replaced with z, so the tests are z>x and z>=z From: Matt J <mjo...@gm...> > Hmm, I fail to see how this works: > > If (y > x) == 0 then that means x <= y. !!!WRONG!!! it means that x >= y > So, > > If x <= y and z < y then 0 << 0 = 0 > If x <= y and z >= y then 0 << 1 = 0 > > In the first case, y is the greatest component In the second case, > z is the > greatest component. > > Both are encoded as 0. Something I'm missing? > > > Hi, > > > > I need to compute the index (not the actual value) of the largest > > coordinate of a normal, for some space hashing. > > > > I'm not sure how fast you guys usually find this index, but I've > just> created the following trick which I think is quite fast: > > > > > inline uint LargestCoordinate(const Vector3d &v) > > > { > > > const float x = fabs(v.x); > > > const float z = fabs(v.z); > > > const float y = Maths::max( fabs(v.y), z ); > > > return uint(fabs(y)>fabs(x)) << uint(fabs(z)>=fabs(y)); > > > } > > > |
From: Olivier G. <gal...@po...> - 2009-02-27 16:47:50
|
On Fri, Feb 27, 2009 at 11:02:58AM -0500, Matt J wrote: > Hmm, I fail to see how this works: > > If (y > x) == 0 then that means x <= y. > > So, > > If x <= y and z < y then 0 << 0 = 0 > If x <= y and z >= y then 0 << 1 = 0 > > In the first case, y is the greatest component In the second case, z is the > greatest component. > > Both are encoded as 0. Something I'm missing? The max. That formula has more fabs than sense. A quick rewrite would be: x = abs(v.x); y = abs(v.y); z = abs(v.z) return (x < max(y,z)) << (z >= y) which is obviously correct. OG. > > > Hi, > > > > I need to compute the index (not the actual value) of the largest > > coordinate of a normal, for some space hashing. > > > > I'm not sure how fast you guys usually find this index, but I've just > > created the following trick which I think is quite fast: > > > > > inline uint LargestCoordinate(const Vector3d &v) > > > { > > > const float x = fabs(v.x); > > > const float z = fabs(v.z); > > > const float y = Maths::max( fabs(v.y), z ); > > > return uint(fabs(y)>fabs(x)) << uint(fabs(z)>=fabs(y)); > > > } > > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA > -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise > -Strategies to boost innovation and cut costs with open source participation > -Receive a $600 discount off the registration fee with the source code: SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list |
From: Matt J <mjo...@gm...> - 2009-02-27 17:11:53
|
Ah, yeah. And !(y > x) is !(y <= x) e.g. x >= y i forgot to double negate in my head. > > > The max. > > That formula has more fabs than sense. A quick rewrite would be: > x = abs(v.x); y = abs(v.y); z = abs(v.z) > return (x < max(y,z)) << (z >= y) > > which is obviously correct. > > OG. > |
From: Jon W. <jw...@gm...> - 2009-02-28 06:51:49
|
Matt J wrote: > > Yeah, okay, i get it, i got it backwards. I just said that =) > > Anyway, here another way that only has two conditionals, no shifts, > and no max's and easier to follow. Not sure if it is faster though. > The FP uses a stack either way, right? I think this will work but > never tested x = fabsf(x); y = fabsf(y); z = fabsf(z); if (x > y) if (x > z) return 0; else return 2; else if (y > z) return 1; else return 2; This is the canonical case which I already posted. It only has two conditionals in the path of execution. It also doesn't use indirect addressing, which may cause aliasing confusion, and forces data out of registers. If you compile with intrinsics, then fabsf is a single instruction and keeps data in registers. I think you'll find it hard to beat. Sincerely, jw |
From: Matt J <mjo...@gm...> - 2009-02-28 23:16:38
|
I did some profiling on MSVC 2009, and integer methods are ~10% faster: unsigned int x = 0x7FFFFFFF & *(unsigned int *)&n.x; unsigned int y = 0x7FFFFFFF & *(unsigned int *)&n.y; unsigned int z = 0x7FFFFFFF & *(unsigned int *)&n.z; return (x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2; For some odd reason only when using integers can it simplify the two inner ternary methods using bit 'magic'. If we use floats or use SSE the integer methods are about ~5% faster because this compiler generates a bunch of jumps. The method above (as well as mine) generates 1 jump. I actually eliminated all jumps but since I use one too many registers it slows it down. I think the CPU is smart enough to look ahead if you get down to one conditional ..so yeah I think if you use the stack it can slow things down slightly but you do see a gain if you use integers. I assume conditional jumps would be the best way, assuming the compiler can do it depth first because you need 3 conditionals - the inner two, and then a conditional move on the results of those two Asm: ; _n$ = edx ; 61 : unsigned int x = 0x7FFFFFFF & *(unsigned int *)&n.x; mov eax, DWORD PTR [edx] ; 62 : unsigned int y = 0x7FFFFFFF & *(unsigned int *)&n.y; mov ecx, DWORD PTR [edx+4] ; 63 : unsigned int z = 0x7FFFFFFF & *(unsigned int *)&n.z; mov edx, DWORD PTR [edx+8] and eax, 2147483647 ; 7fffffffH and ecx, 2147483647 ; 7fffffffH and edx, 2147483647 ; 7fffffffH ; 69 : return (x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2; cmp eax, ecx jbe SHORT $LN3@coord_inde cmp edx, eax sbb eax, eax and eax, -2 ; fffffffeH add eax, 2 ; 70 : } ret 0 $LN3@coord_inde: ; 69 : return (x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2; cmp edx, ecx sbb eax, eax add eax, 2 ; 70 : } ret |
From: Marc H. <ma...@gm...> - 2009-03-01 00:58:50
|
This problem felt a lot like you could somehow just do math on the conditionals and get the answer. The second best I could come up with is: (Course, conditional moves makes much of this moot, but on with the show) //I used 3 for values that can never happen for fun, always cool to go 1 past the end of an array //Replace with 0, 1, or 2 in shipping code, assert in debug static uint s_lookup[] = { /*000*/ 2, /*001*/ 2, /*010*/ 3, /*011*/ 0, /*100*/ 1, /*101*/ 3, /*110*/ 1, /*111*/ 0 } inline uint LargestCoordinate(const Vector3d &v) { uint first_bit = v.x > v.y; uint second_bit = (v.x > v.z) << 1; uint third_bit = (v.y > v.z) << 2; uint index = first_bit | second_bit | third_bit; return lookup[index]; } Then, I thought, I could just load a single uint, shift right by the index, then and off the low 2 bits: //Replace binary syntax with something to get around C's lack of syntax. (like http://c-faq.com/misc/sd28.html ) inline uint LargestCoordinate(const Vector3d &v) { uint val = 0b0001110100111010; uint rotate = (v.y > v.z) << 3 | (v.x > v.z) << 2 | (v.x > v.y) << 1; return ( val >> rotate ) & 0b11; } I found I needed to use the int trick also, otherwise the (float > float) statement generates a jump, blah. (Hire me!) -- m h arc ernandez / these are the days of lethal perfection / |
From: Sylvain G. V. <vi...@ii...> - 2009-03-01 07:49:20
|
What's about something like: yx = SignBit(x-y)>>31 // 1 if y > x yz = SignBit(z-y)>>31 zx = SignBit(x-z)>>31 zy = SignBit(y-z)>>31 return (yx&yz) | ((zx&zy)<<1); 4 subs (int or float, doesn't matter), a few simple bit instructions, no jump (untested, just got back from a diner)? ----- Original Message ----- From: Matt J <mjo...@gm...> Date: Saturday, February 28, 2009 6:17 pm Subject: Re: [Algorithms] How to get 3dvector largest coordinate index? To: Game Development Algorithms <gda...@li...> > I did some profiling on MSVC 2009, and integer methods are ~10% > faster: > unsigned int x = 0x7FFFFFFF & *(unsigned int *)&n.x; > unsigned int y = 0x7FFFFFFF & *(unsigned int *)&n.y; > unsigned int z = 0x7FFFFFFF & *(unsigned int *)&n.z; > > return (x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2; > > For some odd reason only when using integers can it simplify the > two inner > ternary methods using bit 'magic'. If we use floats or use SSE the > integermethods are about ~5% faster because this compiler generates > a bunch of > jumps. The method above (as well as mine) generates 1 jump. > > I actually eliminated all jumps but since I use one too many > registers it > slows it down. I think the CPU is smart enough to look ahead if you > get down > to one conditional ..so yeah I think if you use the stack it can > slow things > down slightly but you do see a gain if you use integers. I assume > conditional jumps would be the best way, assuming the compiler can > do it > depth first because you need 3 conditionals - the inner two, and > then a > conditional move on the results of those two > > Asm: > > > ; _n$ = edx > > ; 61 : unsigned int x = 0x7FFFFFFF & *(unsigned int *)&n.x; > > mov eax, DWORD PTR [edx] > > ; 62 : unsigned int y = 0x7FFFFFFF & *(unsigned int *)&n.y; > > mov ecx, DWORD PTR [edx+4] > > ; 63 : unsigned int z = 0x7FFFFFFF & *(unsigned int *)&n.z; > > mov edx, DWORD PTR [edx+8] > and eax, 2147483647 ; 7fffffffH > and ecx, 2147483647 ; 7fffffffH > and edx, 2147483647 ; 7fffffffH > > ; 69 : return (x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2; > > cmp eax, ecx > jbe SHORT $LN3@coord_inde > cmp edx, eax > sbb eax, eax > and eax, -2 ; fffffffeH > add eax, 2 > > ; 70 : } > > ret 0 > $LN3@coord_inde: > > ; 69 : return (x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2; > > cmp edx, ecx > sbb eax, eax > add eax, 2 > > ; 70 : } > > ret > |
From: Matt J <mjo...@gm...> - 2009-03-01 17:20:13
|
I think you still need the absolute value 1. x = -10 and y = -5 2. x = 10 and y = 5 Both cases x is the largest component, but only in the first case is y > x What's about something like: > yx = SignBit(x-y)>>31 // 1 if y > x > yz = SignBit(z-y)>>31 > zx = SignBit(x-z)>>31 > zy = SignBit(y-z)>>31 > return (yx&yz) | ((zx&zy)<<1); > > 4 subs (int or float, doesn't matter), a few simple bit instructions, no > jump (untested, just got back from a diner)? > |
From: Glenn F. <ga...@ga...> - 2009-03-01 09:10:47
|
if you would like this on GCC with -O2 or greater, you'll need to use unions instead of aliasing pointers, like this: union FloatInt { unsigned int i; float f; }; FloatInt nx, ny, nz; x.f = n.x; y.f = n.y; z.f = n.z; return (x.i > y.i) ? (x.i > z.i) ? 0 : 2 : (y.i > z.i) ? 1 : 2; just fyi, you are going to have very VERY difficult optimizer bugs to track down otherwise - and yes, unions are the only way to guarantee that GCC generates the correct code in this case cheers > I did some profiling on MSVC 2009, and integer methods are ~10% > > faster: > > unsigned int x = 0x7FFFFFFF & *(unsigned int *)&n.x; > > unsigned int y = 0x7FFFFFFF & *(unsigned int *)&n.y; > > unsigned int z = 0x7FFFFFFF & *(unsigned int *)&n.z; > > > > return (x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2; > > > > For some odd reason only when using integers can it simplify the > > two inner > > ternary methods using bit 'magic'. If we use floats or use SSE the > > integermethods are about ~5% faster because this compiler generates > > a bunch of > > jumps. The method above (as well as mine) generates 1 jump. > > > > I actually eliminated all jumps but since I use one too many > > registers it > > slows it down. I think the CPU is smart enough to look ahead if you > > get down > > to one conditional ..so yeah I think if you use the stack it can > > slow things > > down slightly but you do see a gain if you use integers. I assume > > conditional jumps would be the best way, assuming the compiler can > > do it > > depth first because you need 3 conditionals - the inner two, and > > then a > > conditional move on the results of those two > > > > Asm: > > > > > > ; _n$ = edx > > > > ; 61 : unsigned int x = 0x7FFFFFFF & *(unsigned int *)&n.x; > > > > mov eax, DWORD PTR [edx] > > > > ; 62 : unsigned int y = 0x7FFFFFFF & *(unsigned int *)&n.y; > > > > mov ecx, DWORD PTR [edx+4] > > > > ; 63 : unsigned int z = 0x7FFFFFFF & *(unsigned int *)&n.z; > > > > mov edx, DWORD PTR [edx+8] > > and eax, 2147483647 ; 7fffffffH > > and ecx, 2147483647 ; 7fffffffH > > and edx, 2147483647 ; 7fffffffH > > > > ; 69 : return (x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2; > > > > cmp eax, ecx > > jbe SHORT $LN3@coord_inde > > cmp edx, eax > > sbb eax, eax > > and eax, -2 ; fffffffeH > > add eax, 2 > > > > ; 70 : } > > > > ret 0 > > $LN3@coord_inde: > > > > ; 69 : return (x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2; > > > > cmp edx, ecx > > sbb eax, eax > > add eax, 2 > > > > ; 70 : } > > > > ret > > > > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, > CA > -OSBC tackles the biggest issue in open source: Open Sourcing the > Enterprise > -Strategies to boost innovation and cut costs with open source > participation > -Receive a $600 discount off the registration fee with the source code: > SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |
From: Glenn F. <ga...@ga...> - 2009-03-01 09:12:53
|
garrrrr! i hate coding in gmail :) union FloatInt { unsigned int i; float f; }; FloatInt x, y, z; x.f = n.x; y.f = n.y; z.f = n.z; x.i &= 0x7FFFFFFF; y.i &= 0x7FFFFFFF; z.i &= 0x7FFFFFFF; return (x.i > y.i) ? (x.i > z.i) ? 0 : 2 : (y.i > z.i) ? 1 : 2; cheers On Sun, Mar 1, 2009 at 1:10 AM, Glenn Fiedler <ga...@ga...> wrote: > if you would like this on GCC with -O2 or greater, you'll need to use > unions instead of aliasing pointers, like this: > union FloatInt > { > unsigned int i; > float f; > }; > > FloatInt nx, ny, nz; > x.f = n.x; > y.f = n.y; > z.f = n.z; > return (x.i > y.i) ? (x.i > z.i) ? 0 : 2 : (y.i > z.i) ? 1 : 2; > > just fyi, you are going to have very VERY difficult optimizer bugs to track > down otherwise - and yes, unions are the only way to guarantee that GCC > generates the correct code in this case > > cheers > > > I did some profiling on MSVC 2009, and integer methods are ~10% >> > faster: >> > unsigned int x = 0x7FFFFFFF & *(unsigned int *)&n.x; >> > unsigned int y = 0x7FFFFFFF & *(unsigned int *)&n.y; >> > unsigned int z = 0x7FFFFFFF & *(unsigned int *)&n.z; >> > >> > return (x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2; >> > >> > For some odd reason only when using integers can it simplify the >> > two inner >> > ternary methods using bit 'magic'. If we use floats or use SSE the >> > integermethods are about ~5% faster because this compiler generates >> > a bunch of >> > jumps. The method above (as well as mine) generates 1 jump. >> > >> > I actually eliminated all jumps but since I use one too many >> > registers it >> > slows it down. I think the CPU is smart enough to look ahead if you >> > get down >> > to one conditional ..so yeah I think if you use the stack it can >> > slow things >> > down slightly but you do see a gain if you use integers. I assume >> > conditional jumps would be the best way, assuming the compiler can >> > do it >> > depth first because you need 3 conditionals - the inner two, and >> > then a >> > conditional move on the results of those two >> > >> > Asm: >> > >> > >> > ; _n$ = edx >> > >> > ; 61 : unsigned int x = 0x7FFFFFFF & *(unsigned int *)&n.x; >> > >> > mov eax, DWORD PTR [edx] >> > >> > ; 62 : unsigned int y = 0x7FFFFFFF & *(unsigned int *)&n.y; >> > >> > mov ecx, DWORD PTR [edx+4] >> > >> > ; 63 : unsigned int z = 0x7FFFFFFF & *(unsigned int *)&n.z; >> > >> > mov edx, DWORD PTR [edx+8] >> > and eax, 2147483647 ; 7fffffffH >> > and ecx, 2147483647 ; 7fffffffH >> > and edx, 2147483647 ; 7fffffffH >> > >> > ; 69 : return (x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2; >> > >> > cmp eax, ecx >> > jbe SHORT $LN3@coord_inde >> > cmp edx, eax >> > sbb eax, eax >> > and eax, -2 ; fffffffeH >> > add eax, 2 >> > >> > ; 70 : } >> > >> > ret 0 >> > $LN3@coord_inde: >> > >> > ; 69 : return (x > y) ? (x > z) ? 0 : 2 : (y > z) ? 1 : 2; >> > >> > cmp edx, ecx >> > sbb eax, eax >> > add eax, 2 >> > >> > ; 70 : } >> > >> > ret >> > >> >> >> ------------------------------------------------------------------------------ >> Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, >> CA >> -OSBC tackles the biggest issue in open source: Open Sourcing the >> Enterprise >> -Strategies to boost innovation and cut costs with open source >> participation >> -Receive a $600 discount off the registration fee with the source code: >> SFAD >> http://p.sf.net/sfu/XcvMzF8H >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> > > |
From: Matt J <mjo...@gm...> - 2009-03-01 21:43:14
|
Your lookup table varies from 0..3, e.g. 0, 1, 2, 3, but there are only 3 components total, not 4, so I'm not sure how you calculated it Also, you need to look at the absolute value, which you need to determinate the largest component in magnitude > This problem felt a lot like you could somehow just do math on the > conditionals and get the answer. The second best I could come up with > is: > > (Course, conditional moves makes much of this moot, but on with the show) > > //I used 3 for values that can never happen for fun, always cool to go > 1 past the end of an array > //Replace with 0, 1, or 2 in shipping code, assert in debug > static uint s_lookup[] = { > /*000*/ 2, > /*001*/ 2, > /*010*/ 3, > /*011*/ 0, > /*100*/ 1, > /*101*/ 3, > /*110*/ 1, > /*111*/ 0 > } > > inline uint LargestCoordinate(const Vector3d &v) > { > uint first_bit = v.x > v.y; > uint second_bit = (v.x > v.z) << 1; > uint third_bit = (v.y > v.z) << 2; > > uint index = first_bit | second_bit | third_bit; > > return lookup[index]; > } > > Then, I thought, I could just load a single uint, shift right by the > index, then and off the low 2 bits: > //Replace binary syntax with something to get around C's lack of > syntax. (like http://c-faq.com/misc/sd28.html ) > inline uint LargestCoordinate(const Vector3d &v) > { > uint val = 0b0001110100111010; > > uint rotate = (v.y > v.z) << 3 | (v.x > v.z) << 2 | (v.x > v.y) << 1; > > return ( val >> rotate ) & 0b11; > } > > I found I needed to use the int trick also, otherwise the (float > > float) statement generates a jump, blah. > > > (Hire me!) > > > -- > m h > arc ernandez > / these are the days of lethal perfection / > |
From: Marc H. <ma...@gm...> - 2009-03-01 22:53:36
|
Oops, I did forget the fabs, I wrote the sketch in email, then wrote a version for testing on my PC. As for the 3, I explained that in the message. There's 8 possible values for 3 bits, but only 6 are possible configurations of the comparisons. On Sun, Mar 1, 2009 at 1:42 PM, Matt J <mjo...@gm...> wrote: > > Your lookup table varies from 0..3, e.g. 0, 1, 2, 3, but there are only 3 > components total, not 4, so I'm not sure how you calculated it > > Also, you need to look at the absolute value, which you need to determinate > the largest component in magnitude > >> >> This problem felt a lot like you could somehow just do math on the >> conditionals and get the answer. The second best I could come up with >> is: >> >> (Course, conditional moves makes much of this moot, but on with the show) >> >> //I used 3 for values that can never happen for fun, always cool to go >> 1 past the end of an array >> //Replace with 0, 1, or 2 in shipping code, assert in debug >> static uint s_lookup[] = { >> /*000*/ 2, >> /*001*/ 2, >> /*010*/ 3, >> /*011*/ 0, >> /*100*/ 1, >> /*101*/ 3, >> /*110*/ 1, >> /*111*/ 0 >> } >> >> inline uint LargestCoordinate(const Vector3d &v) >> { >> uint first_bit = v.x > v.y; >> uint second_bit = (v.x > v.z) << 1; >> uint third_bit = (v.y > v.z) << 2; >> >> uint index = first_bit | second_bit | third_bit; >> >> return lookup[index]; >> } >> >> Then, I thought, I could just load a single uint, shift right by the >> index, then and off the low 2 bits: >> //Replace binary syntax with something to get around C's lack of >> syntax. (like http://c-faq.com/misc/sd28.html ) >> inline uint LargestCoordinate(const Vector3d &v) >> { >> uint val = 0b0001110100111010; >> >> uint rotate = (v.y > v.z) << 3 | (v.x > v.z) << 2 | (v.x > v.y) << 1; >> >> return ( val >> rotate ) & 0b11; >> } >> >> I found I needed to use the int trick also, otherwise the (float > >> float) statement generates a jump, blah. >> >> >> (Hire me!) >> >> >> -- >> m h >> arc ernandez >> / these are the days of lethal perfection / > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA > -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise > -Strategies to boost innovation and cut costs with open source participation > -Receive a $600 discount off the registration fee with the source code: SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > -- m h arc ernandez / these are the days of lethal perfection / |
From: Gino v. d. B. <gin...@gm...> - 2009-03-01 22:57:19
|
I would like to share my approach. This code is copy-pasted straight from my Vector3 class template so it may look a bit cluttered but I hope the idea comes across: template <typename Scalar> FORCEINLINE int maxAxis(const Vector3<Scalar>& a) { int c0 = a[0] < a[1]; int c1 = a[0] < a[2]; int c2 = a[1] < a[2]; return (c0 & ~c2) | ((c1 & c2) << 1); } template <typename Scalar> FORCEINLINE int minAxis(const Vector3<Scalar>& a) { int c0 = a[1] < a[0]; int c1 = a[2] < a[0]; int c2 = a[2] < a[1]; return (c0 & ~c2) | ((c1 & c2) << 1); } template <typename Scalar> FORCEINLINE int closestAxis(const Vector3<Scalar>& a) { return maxAxis(a * a); } template <typename Scalar> FORCEINLINE int furthestAxis(const Vector3<Scalar>& a) { return minAxis(a * a); } The function minAxis and maxAxis return a value 0, 1, or 2, so the result only needs two bits (00, 01, and 10 in base-2). The first term of the "|" operator is bit-0 and the second term (the one with the << 1) is bit-1. The nice thing about this approach is the fact that its branchless. Three boolean values are computed but they are never used to branch, so no code-cache misses can happen here. For finding the minimum of maximum *absolute* value I do not use "fabs" but I rather multiply the vector with itself, thus a * a = (a.x * a.x, a.y * a.y, a.z * a.z). "closestAxis" returns the world axis that is closest (as in most parallel) to vector "a". "furthestAxis" returns the most orthogonal world axis. Cheers, Gino Sylvain G. Vignaud wrote: > Hi, > > I need to compute the index (not the actual value) of the largest > coordinate of a normal, for some space hashing. > > I'm not sure how fast you guys usually find this index, but I've just > created the following trick which I think is quite fast: > > >> inline uint LargestCoordinate(const Vector3d &v) >> { >> const float x = fabs(v.x); >> const float z = fabs(v.z); >> const float y = Maths::max( fabs(v.y), z ); >> return uint(fabs(y)>fabs(x)) << uint(fabs(z)>=fabs(y)); >> } >> > > I didn't need such function before, so I'm not sure if this is > considered fast or slow. Do you guys have something faster? > > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA > -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise > -Strategies to boost innovation and cut costs with open source participation > -Receive a $600 discount off the registration fee with the source code: SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > GDAlgorithms-list mailing list > GDA...@li... > https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list > |
From: Gino v. d. B. <gin...@gm...> - 2009-03-02 12:29:24
|
I guess you're right (I'm referring to Marc Hernandez). Assigning the result of a float compare to a bool introduces a branch in VC8 as well. How about this trick then template <> FORCEINLINE int maxAxis(const Vector3<float>& v) { const int32_t* a = reinterpret_cast<const int32_t*>(&v[0]); int c0 = a[0] < a[1]; int c1 = a[0] < a[2]; int c2 = a[1] < a[2]; return (c0 & ~c2) | ((c1 & c2) << 1); } If we take the binary values of the floats and do an integer compare on them the result should be equal to the float compare, that is, for non-negative floats. I still have to check whether this will work using gcc version 4 and up, since I'm not sure if I'm breaking the strict-aliasing rule here. Gino Gino van den Bergen wrote: > I would like to share my approach. This code is copy-pasted straight > from my Vector3 class template so it may look a bit cluttered but I > hope the idea comes across: > > template <typename Scalar> > FORCEINLINE > int maxAxis(const Vector3<Scalar>& a) > { > int c0 = a[0] < a[1]; > int c1 = a[0] < a[2]; > int c2 = a[1] < a[2]; > return (c0 & ~c2) | ((c1 & c2) << 1); > } > > template <typename Scalar> > FORCEINLINE > int minAxis(const Vector3<Scalar>& a) > { > int c0 = a[1] < a[0]; > int c1 = a[2] < a[0]; > int c2 = a[2] < a[1]; > return (c0 & ~c2) | ((c1 & c2) << 1); > } > > template <typename Scalar> > FORCEINLINE > int closestAxis(const Vector3<Scalar>& a) > { > return maxAxis(a * a); > } > template <typename Scalar> > FORCEINLINE > int furthestAxis(const Vector3<Scalar>& a) > { > return minAxis(a * a); > } > > The function minAxis and maxAxis return a value 0, 1, or 2, so the > result only needs two bits (00, 01, and 10 in base-2). The first term > of the "|" operator is bit-0 and the second term (the one with the << > 1) is bit-1. The nice thing about this approach is the fact that its > branchless. Three boolean values are computed but they are never used > to branch, so no code-cache misses can happen here. > > For finding the minimum of maximum *absolute* value I do not use > "fabs" but I rather multiply the vector with itself, thus a * a = > (a.x * a.x, a.y * a.y, a.z * a.z). "closestAxis" returns the world > axis that is closest (as in most parallel) to vector "a". > "furthestAxis" returns the most orthogonal world axis. > > Cheers, > > Gino > > > Sylvain G. Vignaud wrote: >> Hi, >> >> I need to compute the index (not the actual value) of the largest >> coordinate of a normal, for some space hashing. >> >> I'm not sure how fast you guys usually find this index, but I've just >> created the following trick which I think is quite fast: >> >> >>> inline uint LargestCoordinate(const Vector3d &v) >>> { >>> const float x = fabs(v.x); >>> const float z = fabs(v.z); >>> const float y = Maths::max( fabs(v.y), z ); >>> return uint(fabs(y)>fabs(x)) << uint(fabs(z)>=fabs(y)); >>> } >>> >> >> I didn't need such function before, so I'm not sure if this is >> considered fast or slow. Do you guys have something faster? >> >> >> ------------------------------------------------------------------------------ >> >> Open Source Business Conference (OSBC), March 24-25, 2009, San >> Francisco, CA >> -OSBC tackles the biggest issue in open source: Open Sourcing the >> Enterprise >> -Strategies to boost innovation and cut costs with open source >> participation >> -Receive a $600 discount off the registration fee with the source >> code: SFAD >> http://p.sf.net/sfu/XcvMzF8H >> _______________________________________________ >> GDAlgorithms-list mailing list >> GDA...@li... >> https://lists.sourceforge.net/lists/listinfo/gdalgorithms-list >> Archives: >> http://sourceforge.net/mailarchive/forum.php?forum_name=gdalgorithms-list >> >> > |