SLAM++ Wiki

high-performance nonlinear least squares solver for graph problems

Brought to you by: iviorela, mareksoso, swajnautcz

Block Matrix

Our library relies on an efficient implementation of block sparse matrix operation.

A block matrix is a matrix which is interpreted as partitioned into sections called blocks that can be manipulated at once. A matrix is called sparse if many of its entries are zero. Considering both, the block structure and the sparsity of the matrices can bring important advantages in terms of storage and operations.

On the above image, there is an example of a randomly generated sparse block matrix used in testing operations on block matrices. It consists of 31 blocks, which are themselves dense 3 x 3 matrices. It has 14 block columns, 9 block rows, and is rank deficient (there are columns and rows which can be obtained by linear combination of the columns and rows).

In problems, such as SLAM, the systems to be solved are represented as matrices. The sizes of the dense blocks correspond to numbers of degrees of freedom of the variables. For 2D problems, there are three degrees of freedom for poses: (x, y) position and rotation in plane, and two degrees of freedom for landmarks: range and bearing. Hence, 2D problems contain 3 x 3 and sometimes also 2 x 2 matrices. The situation is similar with 3D problems where poses have six degrees of freedom: (x, y, z) position and rotation about three axes. Therefore 3D problems contain 6 x 6 matrices.

We take advantage of having the information about degrees of freedom at compile time, and implement efficient matrix operations which can be hinted with possible block sizes.

To implement matrix-vector multiplication, one would do:

CUberBlockMatrix block_matrix;
Eigen::VectorXd vector;
// the inputs

// fill block_matrix and vector

Eigen::VectorXd result;
result.resize(block_matrix.n_Row_Num()); // allocate space
result.setZero(); // set to zero
// the output

block_matrix.PreMultiply_Add(&result(0), result.rows(), &vector(0), vector.rows());
// calculate result += vector * block_matrix

This is plain and simple C ++, we execute the BLAS-style GAXPY operation.

Now, if the block_matrix is a SLAM problem matrix, we know it will only contain specific block sizes. We can hint the operation as follows:

typedef MakeTypelist_Safe((Eigen::Matrix2d, Eigen::Matrix3d)) block_sizes;
block_matrix.PreMultiply_Add_FBS<block_sizes>(&result(0), result.rows(),
    &vector(0), vector.rows());
// calculate result += vector * block_matrix

Now the operation knows that there are only 2 x 2 or 3 x 3 blocks in the matrix, and can perform loop unrolling, vectorization using SSE and other optimizations. This gains significant performance improvements, as seen in the following plot:

This compares our implementation with CSparse, Google Ceres and NIST sparse BLAS. Note that the fixed block size (FBS) operations gets really fast. The same speedup occurs for other operations, e.g. matrix multiplication:

Note that the spikes are caused by SSE, which is better suited for accelerating operations on blocks that are multiples of four in width and height.

Our implementation supports all the common matrix operations. We have recently implemented Cholesky decomposition by blocks, allowing for highly efficient solving of systems of linear equations. The block matrix library can be used independently of the SLAM code wherewer fast operations on sparse block matrices are needed.

There is also a small easter egg for Windows users, hidden in the code. How did we exactly generate the block matrix image on the top?