How to solve large-scale nonlinear optimization problems with Ceres?

Asked 13/10, 2014 at 8:47 Answered 14/10, 2014 at 13:17

Solved c++optimization eigen lapack large-scale

I need to optimize a surface represented by a 2D grid of points to produce normal vectors of the surface that align with provided target normal vectors. The grid size is likely to be between 201x201 and 1001x1001. That means that the number of variables will be 40,000 to 1,000,000, as I am only modifying the z-coordinates of the mesh points.

I am using the Ceres framework, as it is supposed to excel at large-scale nonlinear optimization problems. I already tried MATLAB's fmincon, but it uses an incredible amount of memory. I wrote an objective function that works for small meshes (successful at 3x3 and 31x31). However, when I try to compile the code with a large mesh size (157x200), I see the error below. I have read that this is a limitation of Eigen. However, when I tell Ceres to use LAPACK instead of Eigen, I get the same error for large matrices. I tried these lines:

options.dense_linear_algebra_library_type = ceres::LAPACK;

options.linear_solver_type = ceres::DENSE_QR;

These tell the solver to use LAPACK and DENSE_QR, as the output using a 3x3 mesh shows:

Minimizer                        TRUST_REGION

Dense linear algebra library           LAPACK
Trust region strategy     LEVENBERG_MARQUARDT

                                    Given                     Used
Linear solver                        DENSE_QR                 DENSE_QR
Threads                                     1                        1
Linear solver threads                       1                        1

However, when I use large parameters, I still get the errors for Eigen.

Anyways, I could really use some help with this. How can I get Ceres to optimize a large number of variables (> 30,000)? Thanks in advance

Link to Ceres: http://ceres-solver.org

Link to Eigen: http://eigen.tuxfamily.org/dox/

Error:

In file included from /usr/include/eigen3/Eigen/Core:254:0,
             from /usr/local/include/ceres/jet.h:165,
             from /usr/local/include/ceres/internal/autodiff.h:145,
             from /usr/local/include/ceres/autodiff_cost_function.h:132,
             from /usr/local/include/ceres/ceres.h:37,
             from /home/ubuntu/code/surfaceopt/surfaceopt.cc:10:
/usr/include/eigen3/Eigen/src/Core/DenseStorage.h: In instantiation of ‘Eigen::internal::plain_array<T, Size, MatrixOrArrayOptions, Alignment>::plain_array()     [with T = double; int Size = 31400; int MatrixOrArrayOptions = 2; int Alignment = 0]’:
/usr/include/eigen3/Eigen/src/Core/DenseStorage.h:117:27:   required from     ‘Eigen::DenseStorage<T, Size, _Rows, _Cols, _Options>::DenseStorage() [with T = double; int Size = 31400; int _Rows = 31400; int _Cols = 1; int _Options = 2]’
/usr/include/eigen3/Eigen/src/Core/PlainObjectBase.h:421:55:   required from ‘Eigen::PlainObjectBase<Derived>::PlainObjectBase() [with Derived = Eigen::Matrix<double, 31400, 1, 2, 31400, 1>]’
/usr/include/eigen3/Eigen/src/Core/Matrix.h:203:41:   required from ‘Eigen::Matrix<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols>::Matrix() [with _Scalar = double; int _Rows = 31400; int _Cols = 1; int _Options = 2; int _MaxRows = 31400; int _MaxCols = 1]’
/usr/local/include/ceres/jet.h:179:13:   required from ‘ceres::Jet<T, N>::Jet() [with T = double; int N = 31400]’
/usr/local/include/ceres/internal/fixed_array.h:138:10:   required from ‘ceres::internal::FixedArray<T, inline_elements>::FixedArray(ceres::internal::FixedArray<T, inline_elements>::size_type) [with T = ceres::Jet<double, 31400>; long int inline_elements = 0l; ceres::internal::FixedArray<T, inline_elements>::size_type = long unsigned int]’
/usr/local/include/ceres/internal/autodiff.h:233:70:   required from ‘static bool ceres::internal::AutoDiff<Functor, T, N0, N1, N2, N3, N4, N5, N6, N7, N8, N9>::Differentiate(const Functor&, const T* const*, int, T*, T**) [with Functor = ComputeEint; T = double; int N0 = 31400; int N1 = 0; int N2 = 0; int N3 = 0; int N4 = 0; int N5 = 0; int N6 = 0; int N7 = 0; int N8 = 0; int N9 = 0]’
/usr/local/include/ceres/autodiff_cost_function.h:218:25:   required from ‘bool ceres::AutoDiffCostFunction<CostFunctor, kNumResiduals, N0, N1, N2, N3, N4, N5, N6, N7, N8, N9>::Evaluate(const double* const*, double*, double**) const [with CostFunctor = ComputeEint; int kNumResiduals = 1; int N0 = 31400; int N1 = 0; int N2 = 0; int N3 = 0; int N4 = 0; int N5 = 0; int N6 = 0; int N7 = 0; int N8 = 0; int N9 = 0]’
/home/ubuntu/code/surfaceopt/surfaceopt.cc:367:1:   required from here
/usr/include/eigen3/Eigen/src/Core/DenseStorage.h:41:5: error: ‘OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG’ is not a member of ‘Eigen::internal::static_assertion<false>’
 EIGEN_STATIC_ASSERT(Size * sizeof(T) <= 128 * 128 * 8, OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG);
 ^
/usr/include/eigen3/Eigen/src/Core/DenseStorage.h: In instantiation of ‘Eigen::internal::plain_array<T, Size, MatrixOrArrayOptions, 16>::plain_array() [with T = double; int Size = 31400; int MatrixOrArrayOptions = 1]’:
/usr/include/eigen3/Eigen/src/Core/DenseStorage.h:120:59:   required from ‘Eigen::DenseStorage<T, Size, _Rows, _Cols, _Options>::DenseStorage(Eigen::DenseIndex, Eigen::DenseIndex, Eigen::DenseIndex) [with T = double; int Size = 31400; int _Rows = 1; int _Cols = 31400; int _Options = 1; Eigen::DenseIndex = long int]’
/usr/include/eigen3/Eigen/src/Core/PlainObjectBase.h:438:41:   required from ‘Eigen::PlainObjectBase<Derived>::PlainObjectBase(Eigen::PlainObjectBase<Derived>::Index, Eigen::PlainObjectBase<Derived>::Index, Eigen::PlainObjectBase<Derived>::Index) [with Derived = Eigen::Matrix<double, 1, 31400, 1, 1, 31400>; Eigen::PlainObjectBase<Derived>::Index = long int]’
/usr/include/eigen3/Eigen/src/Core/Matrix.h:273:76:   required from ‘Eigen::Matrix<_Scalar, _Rows, _Cols, _Options, _MaxRows, _MaxCols>::Matrix(const Eigen::MatrixBase<OtherDerived>&) [with OtherDerived = Eigen::Transpose<const Eigen::Matrix<double, 31400, 1, 2, 31400, 1> >; _Scalar = double; int _Rows = 1; int _Cols = 31400; int _Options = 1; int _MaxRows = 1; int _MaxCols = 31400]’
/usr/include/eigen3/Eigen/src/Core/DenseBase.h:367:62:   required from ‘Eigen::DenseBase<Derived>::EvalReturnType Eigen::DenseBase<Derived>::eval() const [with Derived = Eigen::Transpose<const Eigen::Matrix<double, 31400, 1, 2, 31400, 1> >; Eigen::DenseBase<Derived>::EvalReturnType = const Eigen::Matrix<double, 1, 31400, 1, 1, 31400>]’
/usr/include/eigen3/Eigen/src/Core/IO.h:244:69:   required from ‘std::ostream& Eigen::operator<<(std::ostream&, const Eigen::DenseBase<Derived>&) [with Derived = Eigen::Transpose<const Eigen::Matrix<double, 31400, 1, 2, 31400, 1> >; std::ostream = std::basic_ostream<char>]’
/usr/local/include/ceres/jet.h:632:35:   required from ‘std::ostream& ceres::operator<<(std::ostream&, const ceres::Jet<T, N>&) [with T = double; int N = 31400; std::ostream = std::basic_ostream<char>]’
/home/ubuntu/code/surfaceopt/surfaceopt.cc:103:50:   required from ‘bool ComputeEint::operator()(const T*, T*) const [with T = ceres::Jet<double, 31400>]’
/usr/local/include/ceres/internal/variadic_evaluate.h:175:26:   required from ‘static bool ceres::internal::VariadicEvaluate<Functor, T, N0, 0, 0, 0, 0, 0, 0, 0, 0, 0>::Call(const Functor&, const T* const*, T*) [with Functor = ComputeEint; T = ceres::Jet<double, 31400>; int N0 = 31400]’
/usr/local/include/ceres/internal/autodiff.h:283:45:   required from ‘static bool ceres::internal::AutoDiff<Functor, T, N0, N1, N2, N3, N4, N5, N6, N7, N8, N9>::Differentiate(const Functor&, const T* const*, int, T*, T**) [with Functor = ComputeEint; T = double; int N0 = 31400; int N1 = 0; int N2 = 0; int N3 = 0; int N4 = 0; int N5 = 0; int N6 = 0; int N7 = 0; int N8 = 0; int N9 = 0]’
/usr/local/include/ceres/autodiff_cost_function.h:218:25:   required from ‘bool ceres::AutoDiffCostFunction<CostFunctor, kNumResiduals, N0, N1, N2, N3, N4, N5, N6, N7, N8, N9>::Evaluate(const double* const*, double*, double**) const [with CostFunctor = ComputeEint; int kNumResiduals = 1; int N0 = 31400; int N1 = 0; int N2 = 0; int N3 = 0; int N4 = 0; int N5 = 0; int N6 = 0; int N7 = 0; int N8 = 0; int N9 = 0]’
/home/ubuntu/code/surfaceopt/surfaceopt.cc:367:1:   required from here
/usr/include/eigen3/Eigen/src/Core/DenseStorage.h:79:5: error: ‘OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG’ is not a member of ‘Eigen::internal::static_assertion<false>’
 EIGEN_STATIC_ASSERT(Size * sizeof(T) <= 128 * 128 * 8, OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG);
 ^
make[2]: *** [CMakeFiles/surfaceopt.dir/surfaceopt.cc.o] Error 1
make[1]: *** [CMakeFiles/surfaceopt.dir/all] Error 2
make: *** [all] Error 2

My code looks like this (abbreviated to take out irrelevant material):

#define TEXT true
#define VERBOSE false 
#define NV 31400
#define NF 62088
#define NX 157
#define NY 200
#define MAXNB 6

#include <math.h>
#include <ceres/ceres.h>
#include <ceres/rotation.h>
#include "glog/logging.h"
#include <iostream>
#include <fstream>
#include <iterator>
#include <algorithm>
#include <string>

using ceres::AutoDiffCostFunction;
using ceres::CostFunction;
using ceres::Problem;
using ceres::Solver;
using ceres::Solve;
using ceres::CrossProduct;
...

class ComputeEint {

private:
  double XY_ [NV][2];         // X and Y coords
  int C_ [NF][3];             // Connectivity list
  int AF_ [NV][MAXNB];        // List of adjacent faces to each vertex
  double Ntgt_ [NV][3];       // Target normal vectors
  int num_AF_ [NV];           // Number of adjacent faces to each vertex
public:

//Constructor
ComputeEint(double XY[][2], int C[][3], int AF[][MAXNB], double Ntgt[][3], int num_AF[NV]) {

std::copy(&XY[0][0], &XY[0][0]+NV*2,&XY_[0][0]);
...

template <typename T>
bool operator()(const T* const z, T* e) const {
  e[0] = T(0);
  ... 
  //Computes vertex normals for triangulated surface by averaging adjacent face normals
  ...
  e[0] = e[0]/T(NV);
  return true;
  }
};

int main(int argc, char** argv) {
  google::InitGoogleLogging(argv[0]);

  double Tp [NV][3];            //Points in mesh
  int    Tc [NF][3];            //Mesh connectivity list
  double Ntgt [NV][3];          //Target normals
  int    AF [NV][MAXNB];        //List of adjacent faces of each vertex
  int    num_AF [NV];           //Number of adjacent faces for each vertex

  int nx = NX;
  int ny = NY;

  //Read Tp, Tc, Ntgt, AF, num_AF from file
  ...
  // Set up XY for cost functor
  double XY [NV][2];
  double z [NV];
  //Copy first two columns of Tp into XY
  Problem problem;

  // Set up the only cost function (also known as residual). This uses
  // auto-differentiation to obtain the derivative (jacobian).
  CostFunction* cost_function =
  new AutoDiffCostFunction<ComputeEint, 1, NV>(new ComputeEint(XY, Tc, AF, Ntgt, num_AF));

  std::cout << "Created cost function" << "\n";
  problem.AddResidualBlock(cost_function, NULL, &z[0]);

  std::cout << "Added residual block" << "\n";

  // Run the solver!
  Solver::Options options;
  options.minimizer_progress_to_stdout = true;
  options.max_num_iterations = 50;
  options.function_tolerance = 1e-4;
  options.dense_linear_algebra_library_type = ceres::LAPACK;
  Solver::Summary summary;
  Solve(options, &problem, &summary);

  std::cout << summary.FullReport() << "\n";

  //Write output of optimization to file
  ...
  return 0;
}

Yolandoyolane answered 13/10, 2014 at 8:47 Comment(2)

Which line is surfaceopt.cc line 367? – Galvanometer 13/10, 2014 at 16:58

The code has changed a lot since posting this question, as I'm trying to use a dynamic cost function. I believe line 367 was the end of the file, though. – Yolandoyolane 14/10, 2014 at 7:44

Two things

You are using DENSE_QR as your linear solver, which results in a dense jacobian. This is a bad idea. Change your linear solver to SPARSE_NORMAL_CHOLESKY and you should be able to solve problems of this size quite easily.

You are going to need SuiteSpare/CXSparse if you are using version 1.9 or older. If you use the latest release candidate or git version, you should be able to use Eigen to do the sparse linear algebra too.

You are creating a single cost function for the whole problem. Which means you are not exposing any sparsity to the problem. Which is what is causing the stack allocation problem since the automatic differentiation stuff involves data on the stack.

Have a look at the example code that ships with ceres, for example denoising.cc which denoises an entire image, and has a similar grid like structure.

More generally create a single residual block for each vertex in your problem.

Airbrush answered 13/10, 2014 at 20:32 Comment(1)

Dr. Agarwal, thank you so much for responding to my question. I am amazed that you found it and am very grateful that you took the time to respond. I stopped using DENSE_QR. I do have Ceres configured to use SuiteSparse. I am looking at denoising.cc and am trying to implement it. Do you think there is any merit to using DynamicAutoDiffCostFunction as well, as Eider suggested below? Again, thanks a lot. – Yolandoyolane 14/10, 2014 at 7:52

Peter, Yes technically you can use dynamic autodiff cost function, but it is going to be incredibly slow and it is going to present the whole problem as a single residual to the solver, so no sparsity and severely rank deficient jacobian.

You should think about adding a residual per grid point, with it only depending on neighboring nodes.

If you continue using one cost residual block, suitesparse or not you are still going to be in trouble.

Airbrush answered 14/10, 2014 at 13:17 Comment(9)

Dr. Agarwal, in this problem, every node is dependent on the neighboring nodes, as you said. I am uncertain how to formulate the problem in order for the grid points to share information while taking advantage of sparsity. In the denoising.cc example, one cost function was created for each point. If I do the same in this problem, then I could perhaps create a cost function for each point that includes the coordinates for all the adjacent points in a mesh (max of 6 in this triangulation). – Yolandoyolane 14/10, 2014 at 15:24

However, the coordinates of the adjacent points are also parameters that belong to other cost functions and that also need to be optimized. I guess I don't understand how Ceres takes all these cost functions, realizes the links between them, and optimizes the entire surface. Any help that would illuminate the problem for me would be much appreciated. Thanks. – Yolandoyolane 14/10, 2014 at 15:24

That is fine. Create a parameter block for each node. and then define a cost function that depends on say four five nodes, center, left, right, top and bottom. And then instantiate this cost function for every node and its neighbors. Perhaps a better example to look at is the simple_bundle_adjuster.cc example, where the error depends on a camera and a point, and then you instantiate it for every projection of the point in every image. – Airbrush 15/10, 2014 at 0:35

Hello again, Dr. Agarwal. I have been trying to implement the solution you suggested in the previous comment but am running into problems. In my first attempt, I created a Dynamic Autodiff cost function for every mesh point and passed in the adjacent points as constants. This resulted in all of the interior (non-edge and non-corner) points staying at zero because the vertex normals are the average of the face normals, i.e., a change in the height of a vertex would have no change on the normal because all of the face normals would cancel each other out in the average. – Yolandoyolane 19/10, 2014 at 8:2

The next solution we tried was to pass in the heights of the adjacent points as residuals. The problem with this solution is that the heights within a cost function are optimized, but the residuals in each cost function are not communicating with each other. In other words, each cost function has its own version of what its neighborhood of points should look like, but they all conflict. We collected all of the optimized residuals from each cost function and averaged them, but the solution was not usable. – Yolandoyolane 19/10, 2014 at 8:7

We would like to have cost functions at adjacent vertices operate on the same data and work together to find a solution for the whole mesh. How do we do this? We have a Mesh object that holds the vertices in a private variable X. How do we pass this data into the cost functions so that the solver finds a global solution and not just individual solutions for each cost function? – Yolandoyolane 19/10, 2014 at 8:34

Here is the relevant part of our current code: pastebin.com/p9z2wnyD Any suggestions would be appreciated. Thanks. – Yolandoyolane 19/10, 2014 at 8:36

I messed up my terminology in the earlier comments. I added the heights of the adjacent points as parameters to the cost functions, not residuals. Every cost function has one residual and 3-7 parameters, depending on the number of adjacent points. – Yolandoyolane 20/10, 2014 at 1:26

Maybe a bit late to the party, but if you are running into problems where each residual has its own copy, you are most certainly creating new copies of the parameters in each residual instance. You should pass them as pointers (as far as I recall), so Ceres operates on the same object, and also sees the relationships between the residuals and parameters. (In other words, understands the underlying factor graph) – Shae 23/2, 2021 at 3:20

This answer is based completely on my experience with C++ and Eigen (I don't know Ceres):

The option.* lines appear to control runtime behavior, but the error message is a compile time error.

The most relevant part of the error that stands out to me is:

/usr/include/eigen3/Eigen/src/Core/DenseStorage.h:79:5: error: ‘OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG’ is not a member of ‘Eigen::internal::static_assertion’ EIGEN_STATIC_ASSERT(Size * sizeof(T) <= 128 * 128 * 8, OBJECT_ALLOCATED_ON_STACK_IS_TOO_BIG);

Eigen will often allocate fixed sized matrices on the stack and I think that is what is happening here. With matrices the size that you are dealing with, they need to be allocated on the heap. To force Eigen to do this, try choosing dynamically sized matrices. After it compiles, you should be able run it with or without Eigen by using the options that you found.

The specific solution appears to be using DynamicAutoDiffCostFunction instead of AutoDiffCostFunction. A relevant snippet of the documentation:

AutoDiffCostFunction requires that the number of parameter blocks and their sizes be known at compile time. It also has an upper limit of 10 parameter blocks.

http://ceres-solver.org/modeling.html?highlight=dynamic#dynamicautodiffcostfunction

Ledger answered 13/10, 2014 at 19:45 Comment(1)

Yea, we implemented a DynamicAutoDiffCostFunction, but like Dr. Agarwal said, it is too slow. – Yolandoyolane 14/10, 2014 at 16:37

Recommended topics

Hot tags