Eigen: Efficient Kronecker Product
Asked Answered
F

2

7

I am using Eigen for a project I am working on, where runtime performance is absolutely crucial (needs to meet real-time constraints).

So far, Eigen gives me reasonably good performance. However, I need to evaluate a Kronecker product. I am using Eigen's unsupported KroneckerProduct module, but I am thinking it is suboptimal for my needs.

The two matrices I am computing the Kronecker product with are of fixed size (known at compile time), and structure. One matrix is square and diagonal, let's assume it is an Identity matrix. The other is a small, square matrix. In code, like this:

MatrixXf I = MatrixXf::Identity(4,4);
MatrixXf X = MatrixXf::Random(8,8);
MatrixXf P = kroneckerProduct(I,X);

Since I is diagonal, I am guessing that we can make this faster since we only need to evaluate 4 matrix by scalar multiplications in order to compute all of the elements (since many will be zero).

What is the fastest and most efficient way to do this with Eigen?

Flaunt answered 8/8, 2016 at 22:25 Comment(0)
F
7

In Eigen 3.3 beta there is now (unsupported) support for sparse Kronecker products. That being said, if performance is critical, I would not yet recommend moving to 3.3 beta. Additionally, if you know that I is a diagonal matrix, you would probably get better performance writing your own. Plus, if the size is known at compile time (and not too large), you can replace MatrixXf with Matrix4f (fixed size, will be allocated on the stack, not the heap). So roll it all together and you get:

Matrix4f I4 = Matrix4f::Identity();
MatrixXf P2(I4.rows() * X.rows(), I4.cols() * X.cols());
P2.setZero();

for (int i = 0; i < I4.RowsAtCompileTime; i++)
{
    P2.block(i*X.rows(), i*X.cols(), X.rows(), X.cols()) = I4(i, i) * X;
}
Fulgurant answered 9/8, 2016 at 6:5 Comment(2)
This actually provides a reasonable speed up. Since we're referencing the RowsAtCompileTime, I am assuming that the compiler will be able to unroll that loop? What options should I use besides -march=native -mtune=native -O3 (I am using clang++)?Flaunt
Using RowsAtCompileTime should help the compiler unroll the loop. The speedup is presumable from the fact that it's only computing the block diagonal and not the entire outer product. The unrolled loop probably doesn't actually contribute to the speed up. I'm pretty sure that I4.rows() would end up being the same constant in this case. Also, try making X a fixed size matrix as well.Fulgurant
L
-2

one option that I can think about is to make a class that will inherit MatrixXf and will contain 3 matrixes : I,X and P. P will be a struct that will contain 2 matrixes with the size of P, in one of them the content of the matrix will be bool and the other will be the same as the product.

class MatrixXfExample : public MatrixXf {

MatrixXf I,X;
MatrixXfPair Data;
}

struct MatrixXfPair {
MatrixXf Visited,Contant;
}

MatrixXfPair constructor will initiate Visited to false and leave Content alone (default).

MatrixXfExample constructor will initiate I,X with copy construtor and Data with default.

now, just override () operator to check if the content in Data.Visited is false, and to make the multiple calculation only if it haven't calculated before. (kind of implement the idea of compiling the code only when using it).

Luxe answered 8/8, 2016 at 23:7 Comment(1)
I'm not sure I follow what you're saying, but that seems no more efficient than just computing the Kronecker product naively...Flaunt

© 2022 - 2024 — McMap. All rights reserved.