Supposing matrix is represented as stxxl::vector, storing entries row by row, I'm struggling to implement more efficient
algorithm for matrix transpose. This is what I have so far:
What would be another, possibly faster approach to perform this operation? Note that I dont have experience with external
memory algorithms. Any help on this is welcome.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
A way supported by the STXXL would be to use sort for permutation. Associate each matrix entry with its supposed vector rank after the tranpose, and sort these pairs. Afterwards, strip off the additional rank from each entry. You should use streaming (http://algo2.iti.kit.edu/stxxl/trunk/group__streampack.html) to keep overhead low.
Using this method, you will be at least 3 times slower than the optimum, but it might still be much better than the naive algorihtm stated above.
Johannes
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For an example, look into containers/test_matrix.cpp
Our best results, using the Intel MKL for the internal BLAS routines and 8 cores on doubles, are as follows:
800MB per matrix: ~40s
80GB per matrix: ~30000s
So for 10GB per matrix, this could work in less than 1500s on a such powerful machine.
Please be aware that the interface is far from being finished, and will change without further notice.
Johannes
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Supposing matrix is represented as stxxl::vector, storing entries row by row, I'm struggling to implement more efficient
algorithm for matrix transpose. This is what I have so far:
What would be another, possibly faster approach to perform this operation? Note that I dont have experience with external
memory algorithms. Any help on this is welcome.
For I/O-efficient algorithms,
A way supported by the STXXL would be to use sort for permutation. Associate each matrix entry with its supposed vector rank after the tranpose, and sort these pairs. Afterwards, strip off the additional rank from each entry. You should use streaming (http://algo2.iti.kit.edu/stxxl/trunk/group__streampack.html) to keep overhead low.
Using this method, you will be at least 3 times slower than the optimum, but it might still be much better than the naive algorihtm stated above.
Johannes
You might be interested in the fact that we now have an experimental STXXL branch that supports matrix multiplication.
https://stxxl.svn.sourceforge.net/svnroot/stxxl/branches/matrix
For installation, please follow the instructions given here:
http://algo2.iti.kit.edu/stxxl/trunk/install-svn.html
For an example, look into containers/test_matrix.cpp
Our best results, using the Intel MKL for the internal BLAS routines and 8 cores on doubles, are as follows:
800MB per matrix: ~40s
80GB per matrix: ~30000s
So for 10GB per matrix, this could work in less than 1500s on a such powerful machine.
Please be aware that the interface is far from being finished, and will change without further notice.
Johannes