low RAM consuming c++ eigen solver
Asked Answered
M

2

5

I'm newbie in C++ programming, but I have a task to compute eigenvalues and eigenvectors (standard eigenproblem Ax=lx) for symmetric matrices (and hermitian)) for very large matrix of size: binomial(L,L/2) where L is about 18-22. Now I'm testing it on machine which has about 7.7 GB ram available, but at final I'll have access to PC with 64GB RAM.

I've started with Lapack++. At the beginning my project assume to solve this problem only for symmetrical real matrices.

This library was great. Very quick and small RAM consuming. It has an option to compute eigenvectors and place into input matrix A to save memory. It works! I thought that Lapack++ eigensolver can handle with Hermitian matrix, but it can't for unknown reason (maybe I'm doing something wrong). My project has evolved and I should be able to calculate also this problem for Hermitian matrices.

So I've tried to change library to Armadillo library. It works fine, but it isn't that nice as Lapack++ which replaces mat A with all eigenvec, but of course supports hermitian matrices.

Some statistic for L=14

  • Lapack++ RAM 126MB time 7.9s eigenval + eigenvectors

  • Armadillo RAM 216MB time 12s eigenval

  • Armadillo RAM 396MB time 15s eigenval+eigenvectors

Let's do some calculation: double variable is about 8B. My matrix has size binomial(14,7) = 3432, so in the ideal case it should have 3432^2*8/1024^2 = 89 MB.

My question is: is it posible to modify or inforce Armadillo to do a nice trick as Lapack++? Armadillo uses LAPACK and BLAS routines. Or maybe someone can recomemnd another approach to this problem using another library?

P.S.: My matrix is really sparse. It has about 2 * binomial(L,L/2) non-zero elements. I've tried to compute this with SuperLU in CSC format but It wasn't very effective, for L=14 -> RAM 185MB, but time 135s.

Moguel answered 28/8, 2015 at 10:32 Comment(6)
Have you considered using the SLEPcEigenSolver with PETSc?Empty
@Empty I've heard about it, but I think this library will be too difficult for me. If I have time I'll try to test it.Moguel
In LAPACK you have the routine zheev (apfel.mathematik.uni-ulm.de/~lehn/FLENS/cxxlapack/netlib/lapack/…) for computing the eigenvalues and -vectors of a hermitian matrix. However, you seem to be right in Lapack++ it seems they are not supporting complex valued matrices (or at least not hermitian). So maybe it is a good idea call the LAPACK routine directly from C++.Cordero
@MichaelLehn Thank you for the reply. I know that LAPACK has such as routine, but I don't learn how to call it directly in C++ yet, so I will try to learn how to call it first! :)Moguel
In FLENS there are C++ wrappers for LAPACK. Including for zheev. So basically you can rip the out of FLENS. You also can first check if FLENS is doing the job for you. Here (apfel.mathematik.uni-ulm.de/~lehn/FLENS/flens/examples/…) is an example for computing eigenvalues/-vectors for a symmetric matrix. For hermitian matrices there's only an example (github.com/michael-lehn/FLENS/blob/public/flens/examples/…) in github but not yet in the online tutorial. These examples from the tutorial only setup small demo-cases.Cordero
But it is still a good idea to start with these so that you are sure how to link with an external LAPACK (basically you just compile with -DUSE_CXXLAPACK and add the usual linker options). For large examples have a look at the tutorial on how to setup large matrices.Cordero
P
7

Both Lapackpp and Armadillo rely on Lapack to compute the eigenvalues and eigenvectors of complex matrices. The Lapack library provides different way to perform these operations for complex hermitian matrices.

  • The function zgeev() does not care about the matrix being Hermitian. This function is called by the Lapackpp library for matrices of type LaGenMatComplex in the function LaEigSolve. The function eig_gen() of the Armadillo library calls this function.

  • The function zheev() is dedicated to complex Hermitian matrices. It first call ZHETRD to reduce Hermitian matrix to tridiagonal form. Depending on whether the eigenvectors are needed, it then uses a QR algorithm to compute the eigenvalues and eigenvectors (if needed). The function eig_sym() of the Armadillo library call this function if the method std is selected.

  • The function zheevd() does the same thing as zheev() if the eigenvectors are not required. Otherwise, it makes use of a divide and conquert algorithm (see zstedc()). The function eig_sym() of the Armadillo library call this function if the method dc is selected. Since the divide and conquer proved to be faster for large matrices, it is now the default method.

Functions with more options are available in the Lapack library. (see zheevr() or zheevx). If you want to keep a dense matrix format, you can also try the ComplexEigenSolver of the Eigen library.

Here is a little C++ test using the C wrapper LAPACKE of the Lapack library. It is compiled by g++ main.cpp -o main2 -L /home/...../lapack-3.5.0 -llapacke -llapack -lblas

#include <iostream>

#include <complex>
#include <ctime>
#include <cstring>

#include "lapacke.h"

#undef complex
using namespace std;

int main()
{
    //int n = 3432;

    int n = 600;

    std::complex<double> *matrix=new std::complex<double>[n*n];
    memset(matrix, 0, n*n*sizeof(std::complex<double>));
    std::complex<double> *matrix2=new std::complex<double>[n*n];
    memset(matrix2, 0, n*n*sizeof(std::complex<double>));
    std::complex<double> *matrix3=new std::complex<double>[n*n];
    memset(matrix3, 0, n*n*sizeof(std::complex<double>));
    std::complex<double> *matrix4=new std::complex<double>[n*n];
    memset(matrix4, 0, n*n*sizeof(std::complex<double>));
    for(int i=0;i<n;i++){
        matrix[i*n+i]=42;
        matrix2[i*n+i]=42;
        matrix3[i*n+i]=42;
        matrix4[i*n+i]=42;
    }

    for(int i=0;i<n-1;i++){
        matrix[i*n+(i+1)]=20;
        matrix2[i*n+(i+1)]=20;
        matrix3[i*n+(i+1)]=20;
        matrix4[i*n+(i+1)]=20;

        matrix[(i+1)*n+i]=20;
        matrix2[(i+1)*n+i]=20;
        matrix3[(i+1)*n+i]=20;
        matrix4[(i+1)*n+i]=20;
    }

    double* w=new double[n];//eigenvalues

    //the lapack function zheev
    clock_t t;
    t = clock();
    LAPACKE_zheev(LAPACK_COL_MAJOR,'V','U', n,reinterpret_cast< __complex__ double*>(matrix), n, w);
    t = clock() - t;
    cout<<"zheev : "<<((float)t)/CLOCKS_PER_SEC<<" seconds"<<endl;
    cout<<"largest eigenvalue="<<w[n-1]<<endl;

    std::complex<double> *wc=new std::complex<double>[n];
    std::complex<double> *vl=new std::complex<double>[n*n];
    std::complex<double> *vr=new std::complex<double>[n*n];

    t = clock();
    LAPACKE_zgeev(LAPACK_COL_MAJOR,'V','V', n,reinterpret_cast< __complex__ double*>(matrix2), n, reinterpret_cast< __complex__ double*>(wc),reinterpret_cast< __complex__ double*>(vl),n,reinterpret_cast< __complex__ double*>(vr),n);
    t = clock() - t;
    cout<<"zgeev : "<<((float)t)/CLOCKS_PER_SEC<<" seconds"<<endl;
    cout<<"largest eigenvalue="<<wc[0]<<endl;

    t = clock();
    LAPACKE_zheevd(LAPACK_COL_MAJOR,'V','U', n,reinterpret_cast< __complex__ double*>(matrix3), n, w);
    t = clock() - t;
    cout<<"zheevd : "<<((float)t)/CLOCKS_PER_SEC<<" seconds"<<endl;
    cout<<"largest eigenvalue="<<w[n-1]<<endl;

    t = clock();
    LAPACKE_zheevd(LAPACK_COL_MAJOR,'N','U', n,reinterpret_cast< __complex__ double*>(matrix4), n, w);
    t = clock() - t;
    cout<<"zheevd (no vector) : "<<((float)t)/CLOCKS_PER_SEC<<" seconds"<<endl;
    cout<<"largest eigenvalue="<<w[n-1]<<endl;

    delete[] w;
    delete[] wc;
    delete[] vl;
    delete[] vr;
    delete[] matrix;
    delete[] matrix2;
    return 0;
}

The output I have of my computer is :

zheev : 2.79 seconds
largest eigenvalue=81.9995
zgeev : 10.74 seconds
largest eigenvalue=(77.8421,0)
zheevd : 0.44 seconds
largest eigenvalue=81.9995
zheevd (no vector) : 0.02 seconds
largest eigenvalue=81.9995

These tests could have been performed by using the Armadillo library. Calling directly the Lapack library might allow you to gain some memory, but wrappers of Lapack can be efficient on this aspect as well.

The real question is whether you need all the eigenvectors, all the eigenvalues or only the largest eigenvalues. Because there are really efficient methods in the last case. Take a look at the Arnoldi/Lanczos itterative algorithms. Huge memory gains are possible if the matrix is sparce since only matrix-vector products are performed : there is no need to keep the dense format. This is what is done in the SlepC library, which makes use of the sparce matrix formats of Petsc. Here is an example of Slepc which can be used as a starting point.

Pommel answered 28/8, 2015 at 22:52 Comment(0)
M
1

If someone has same problem as I in the future there are some tips after I found the solution (thank for all that posted some answers or clues!).

On the intel site you can found some nice examples which are written in Fortran and C. For example hermitian eigenvalue problem routine zheev(): https://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_lapack_examples/zheev_ex.c.htm

To make it works in C++ you should edit some lines in that code:

In prototype functions declaration do similar for all functions: extern void zheev( ... ) change to extern "C" {void zheev( ... )}

Change calling lapack's function adding _ symbol for example: zheev( ... ) to zheev_( ... ) (make it for all in code simply by substitution, but I don't why it works. I figured it out by making some experiments.)

Optionally you can convert printf function to a standard stream std::cout and replace included headers stdio.h to iostream.

To compile run command like: g++ test.cpp -o test -llapack -lgfortran -lm -Wno-write-strings

About the last option -Wno-write-strings, I don't now what it's doing, but probably there're problems with intel's example when they put string not char in calling function zheev( "Vectors", "Lower", ... )

Moguel answered 30/8, 2015 at 9:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.