Thread: Re: [Algorithms] How to get 3dvector largest coordinate index? (Page 2)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

After profiling I realized that eliminating one jump isn't necessarily worth
all the work. On my Core II it takes a lot of overhead before a jump is
noticeable. Adding more bitwise logic to avoid it also can even slows things
down depending on what 'trick' I use.

If you want to use integers try the FloatInt union trick Glenn mentioned,
that is a much cleaner way of casting.

 This whole exercise is probably a great example of premature optimization
=) If you need the speed, you can dip into assembly. But since I'm working
on a math library and exploring more optimization I wanted to jump in.

>From what I've seen now is if you really need speed, nothing is going to
beat a.) using a good compiler or b.) using SSE

 With SSE you can square all the components in one instruction, swizzle and
do all 3 compares in a few more, do an AND in one instruction and a not-AND
on another, and best of all it parallelizes super well. The architectures
that have the worst jumps are the ones to benefit from SSE, like the P4, and
I read they have a super fast SSE path with high throughput

> I guess you're right (I'm referring to Marc Hernandez). Assigning the
> result of a float compare to a bool introduces a branch in VC8 as well.
> How about this trick then
>
>   template <>
>   FORCEINLINE
>   int maxAxis(const Vector3<float>& v)
>   {
>       const int32_t* a = reinterpret_cast<const int32_t*>(&v[0]);
>       int c0 = a[0] < a[1];
>       int c1 = a[0] < a[2];
>       int c2 = a[1] < a[2];
>       return (c0 & ~c2) | ((c1 & c2) << 1);
>   }
>
> If we take the binary values of the floats and do an integer compare on
> them the result should be equal to the float compare, that is, for
> non-negative floats. I still have to check whether this will work using
> gcc version 4 and up, since I'm not sure if I'm breaking the
> strict-aliasing rule here.
>
> Gino
>
>
> Gino van den Bergen wrote:
> > I would like to share my approach. This code is copy-pasted straight
> > from my Vector3 class template so it may look a bit cluttered but I
> > hope the idea comes across:
> >
> >    template <typename Scalar>
> >    FORCEINLINE
> >    int maxAxis(const Vector3<Scalar>& a)
> >    {
> >        int c0 = a[0] < a[1];
> >        int c1 = a[0] < a[2];
> >        int c2 = a[1] < a[2];
> >        return (c0 & ~c2) | ((c1 & c2) << 1);
> >    }
> >
> >    template <typename Scalar>
> >    FORCEINLINE
> >    int minAxis(const Vector3<Scalar>& a)
> >    {
> >        int c0 = a[1] < a[0];
> >        int c1 = a[2] < a[0];
> >        int c2 = a[2] < a[1];
> >        return (c0 & ~c2) | ((c1 & c2) << 1);
> >    }
> >
> >    template <typename Scalar>
> >    FORCEINLINE
> >    int closestAxis(const Vector3<Scalar>& a)
> >    {
> >        return maxAxis(a * a);
> >    }
> >      template <typename Scalar>
> >    FORCEINLINE
> >    int furthestAxis(const Vector3<Scalar>& a)
> >    {
> >        return minAxis(a * a);
> >    }
> >
> > The function minAxis and maxAxis return  a value  0, 1, or 2, so the
> > result only needs two bits (00, 01, and 10 in base-2). The first term
> > of the "|" operator is bit-0 and the second term (the one with the <<
> > 1) is bit-1. The nice thing about this approach is the fact that its
> > branchless. Three boolean values are computed but they are never used
> > to branch, so no code-cache misses can happen here.
> >
> > For finding the minimum of maximum *absolute* value I do not use
> > "fabs" but I rather multiply the vector with itself, thus a * a  =
> > (a.x  * a.x, a.y * a.y, a.z * a.z).  "closestAxis" returns the world
> > axis that is closest (as in most parallel) to vector "a".
> > "furthestAxis" returns the most orthogonal world axis.
> >
> > Cheers,
> >
> > Gino
> >
> >
> > Sylvain G. Vignaud wrote:
> >> Hi,
> >>
> >> I need to compute the index (not the actual value) of the largest
> >> coordinate of a normal, for some space hashing.
> >>
> >> I'm not sure how fast you guys usually find this index, but I've just
> >> created the following trick which I think is quite fast:
> >>
> >>
> >>> inline uint LargestCoordinate(const Vector3d &v)
> >>> {
> >>>     const float x = fabs(v.x);
> >>>     const float z = fabs(v.z);
> >>>     const float y = Maths::max( fabs(v.y), z );
> >>>     return uint(fabs(y)>fabs(x)) << uint(fabs(z)>=fabs(y));
> >>> }
> >>>
> >>
> >> I didn't need such function before, so I'm not sure if this is
> >> considered fast or slow. Do you guys have something faster?
>

Thread: Re: [Algorithms] How to get 3dvector largest coordinate index? (Page 2)

gdalgorithms-list