>Why is this commented
I believe that
was the only place where we used D3DX, and it wasn't worth dragging in that
whole thing for just a matrix multiply. If you have a better replacement,
feel free to try it.
> why does it do that?
All of those functions were drop-in replacements of the
D3DX functions. That's the only reason why the parameter order and output
are the way they are. If you see opportunities to clean this up, feel
free. There's no reason any more to match the D3DX functions any
I was looking through RageMath.cpp and I noticed a few oddities.
defined(_WINDOWS) || defined(_XBOX)
// <30 cycles for
theirs versus >100 for ours.
(D3DMATRIX*)pOut, (D3DMATRIX*)pA, (D3DMATRIX*)pB );
Why is this commented out? It seems to me that if D3D
can do a faster multiply, then it should be used. Using a standard and
highly optimized version of BLAS (using SIMD), I can do the same thing
on computers running OS X where it is supported using cblas_sgemm().
What was really tripping me up for a long time is that
there is a comment in RageTypes.h that says that RageMatrix is stored in
row-major order. Everywhere I looked this seemed to be the case; however, using
the CblasRowMajor argument to cblas_sgemm() failed miserably--not to mention
produced amusing results, http://www.cs.washington.edu/homes/steve/screen00045.jpg
I tried CblasColMajor instead and it worked perfectly.
I couldn't figure out why a column-major multiply worked when the matricies were
supposed to be row-major. Then it hit me, a col-major matrix viewed as a
row-major matrix is just the transpose of the matrix and
A^t * B^t = (B*A)^t
Looking a little more closely at the implementation of
RageMatrixMultiply(), it's clear that RageMatrixMultiply(&C, &A, &B)
produces C = B * A.
Now for my second question, why does it do that? That's
extremely unintuitive. In addition, it does not follow the order used in
RageVec4TransformCoord() which just multiples a row vector by a matrix.
In the end, a fast matrix multiply that preserves the
functionality of RageMatrixMultiply can be obtained via:
CblasNoTrans, CblasNoTrans, 4,
4, &pA->m00, 4, 0, &pOut->m00,