## Matrix transpose document.SUBSCRIPTION_OPTIONS = { "thing": "thread", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

2010-07-16
2013-04-25
• Nobody/Anonymous - 2010-07-16

Supposing matrix is represented as stxxl::vector, storing entries row by row, I'm struggling to implement more efficient
algorithm for matrix transpose. This is what I have so far:

```template <typename MatType>
void transpose(const MatType& myMat, MatType& transposed) {
transposed.rowNo=myMat.colNo;
transposed.colNo=myMat.rowNo;
transposed.mat.resize(transposed.rowNo*transposed.colNo);
typename MatType::vt::const_iterator myFiter=myMat.mat.begin();
typename MatType::vt::iterator mySiter=transposed.mat.begin();
unsigned long long k=0;unsigned long long t=0;int shift=0;
while(t<transposed.colNo) {
while(k<transposed.rowNo) {
*(mySiter+(k*transposed.colNo+shift))=*myFiter;
k++;myFiter+=1;
}
k=0;t++;shift++;
}
}
```

What would be another, possibly faster approach to perform this operation? Note that I dont have experience with external
memory algorithms. Any help on this is welcome.

• Johannes Singler - 2010-07-19

For I/O-efficient algorithms,

See e.g. J.S.Vitter: Algorithms and Data Structures for External Memory, 2008 http://www.cs.duke.edu/~jsv/Papers/catalog/node81.html#Vit:I_Obook Chapter 7 is specifically about matrix operations, and the whole book is about I/O efficiency.

A way supported by the STXXL would be to use sort for permutation.  Associate each matrix entry with its supposed vector rank after the tranpose, and sort these pairs.  Afterwards, strip off the additional rank from each entry.  You should use streaming (http://algo2.iti.kit.edu/stxxl/trunk/group__streampack.html) to keep overhead low.
Using this method, you will be at least 3 times slower than the optimum, but it might still be much better than the naive algorihtm stated above.

Johannes

• Johannes Singler - 2010-12-21

You might be interested in the fact that we now have an experimental STXXL branch that supports matrix multiplication.
https://stxxl.svn.sourceforge.net/svnroot/stxxl/branches/matrix

For installation, please follow the instructions given here:
http://algo2.iti.kit.edu/stxxl/trunk/install-svn.html

For an example, look into containers/test_matrix.cpp

Our best results, using the Intel MKL for the internal BLAS routines and 8 cores on doubles, are as follows:
800MB per matrix: ~40s
80GB per matrix:  ~30000s
So for 10GB per matrix, this could work in less than 1500s on a such powerful machine.

Please be aware that the interface is far from being finished, and will change without further notice.

Johannes