>Why is this commented out?
I believe that was the only place where we used D3DX, and it wasn't worth dragging in that whole thing for just a matrix multiply.  If you have a better replacement, feel free to try it.
> why does it do that?
All of those functions were drop-in replacements of the D3DX functions.  That's the only reason why the parameter order and output are the way they are.  If you see opportunities to clean this up, feel free.  There's no reason any more to match the D3DX functions any more.

From: stepmania-devs-admin@lists.sourceforge.net [mailto:stepmania-devs-admin@lists.sourceforge.net] On Behalf Of Steve Checkoway
Sent: Monday, June 20, 2005 6:54 AM
To: StepMania DEVS
Subject: [Stepmania-devs] RageMath questions

I was looking through RageMath.cpp and I noticed a few oddities.

//#if defined(_WINDOWS) || defined(_XBOX)
//    // <30 cycles for theirs versus >100 for ours.
//    D3DXMatrixMultiply( (D3DMATRIX*)pOut, (D3DMATRIX*)pA, (D3DMATRIX*)pB );
Why is this commented out? It seems to me that if D3D can do a faster multiply, then it should be used. Using a standard and highly optimized version of BLAS (using SIMD), I can do the same thing on computers running OS X where it is supported using cblas_sgemm().

What was really tripping me up for a long time is that there is a comment in RageTypes.h that says that RageMatrix is stored in row-major order. Everywhere I looked this seemed to be the case; however, using the CblasRowMajor argument to cblas_sgemm() failed miserably--not to mention produced amusing results, http://www.cs.washington.edu/homes/steve/screen00045.jpg.

I tried CblasColMajor instead and it worked perfectly. I couldn't figure out why a column-major multiply worked when the matricies were supposed to be row-major. Then it hit me, a col-major matrix viewed as a row-major matrix is just the transpose of the matrix and
A^t * B^t = (B*A)^t
Looking a little more closely at the implementation of RageMatrixMultiply(), it's clear that RageMatrixMultiply(&C, &A, &B) produces C = B * A.

Now for my second question, why does it do that? That's extremely unintuitive. In addition, it does not follow the order used in RageVec4TransformCoord() which just multiples a row vector by a matrix.

In the end, a fast matrix multiply that preserves the functionality of RageMatrixMultiply can be obtained via:
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 4, 4, 4, 1,
            &pB->m00, 4, &pA->m00, 4, 0, &pOut->m00, 4);

- Steve