Eigen and huge dense 2D arrays
Asked Answered
J

3

7

I am using 2D Eigen::Arrays for a project, and I like to keep using them in the case of huge 2D arrays.

For avoiding memory issues, I thought to use memory mapped files to manage (read/modify/write) these arrays, but I cannot find working examples.

The closest example that I have found is this based on boost::interprocess, but it uses shared-memory (while I'd prefer to have persistent storage).

The lack of examples makes me worry if there is a better, main-stream alternative solution to my problem. Is this the case? A minimal example would be very handy.

EDIT:

This is a minimal example explaining my use case in the comments:

#include <Eigen/Dense>


int main()
{
    // Order of magnitude of the required arrays
    Eigen::Index rows = 50000;
    Eigen::Index cols = 40000;

    {
        // Array creation (this is where the memory mapped file should be created)
        Eigen::ArrayXXf arr1 = Eigen::ArrayXXf::Zero( rows, cols );

        // Some operations on the array
        for(Eigen::Index i = 0; i < rows; ++i)
        {
            for(Eigen::Index j = 0; j < cols; ++j)
            {
                arr1( i, j ) = float(i * j);
            }
        }

        // The array goes out of scope, but the data are persistently stored in the file
    }

    {
        // This should actually use the data stored in the file
        Eigen::ArrayXXf arr2 = Eigen::ArrayXXf::Zero( rows, cols );

        // Manipulation of the array data
        for(Eigen::Index i = 0; i < rows; ++i)
        {
            for(Eigen::Index j = 0; j < cols; ++j)
            {
                arr2( i, j ) += 1.0f;
            }
        }

        // The array goes out of scope, but the data are persistently stored in the file
    }

}
Jeri answered 30/6, 2018 at 19:48 Comment(8)
you can create huge swap files and have the OS swap pages as necessaryYesman
@Darklighter, could you develop a bit more your comment?Jeri
Instead of trying to use a file as a chunk of memory you can simply expand your virtual memory via the page/swap file. This allows you to use matrices that are bigger than your physical memory. It probably only works reasonably well on an SSD though.Yesman
As you said yourself, a minimal reproducible example of what you actually intend to do would be very handy. Or at least some pseudo-code. Do you have existing arrays stored, which you want to traverse linearly/access randomly? Or do you generate huge arrays at runtime which are just too large to fit in your RAM? What orders of magnitude are you working with?Factorial
@Yesman your approach is surely an interesting one, but it does not work well in my case. I cannot ask all the users of such a library to perform the operation that you describe.Jeri
@Factorial just added a minimal example for my main use case.Jeri
You should be able to come up with something using boost::interprocess to map a file as a memory buffer, and then (Eigen::Map)[eigen.tuxfamily.org/dox/group__TutorialMapClass.html] to view it and manipulate it as an ArrayXXf.Oina
@Oina Thank you for the hint! Are you aware of some code attempting to do it? It would greatly help my efforts.Jeri
J
0

Based on this comment and these answers (https://mcmap.net/q/1519893/-eigen-and-huge-dense-2d-arrays and https://mcmap.net/q/1519893/-eigen-and-huge-dense-2d-arrays), this is my working solution:

#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/mapped_region.hpp>
#include <Eigen/Dense>
#include <iostream>
#include <fstream>
#include <filesystem>

namespace fs = std::experimental::filesystem;
namespace bi = boost::interprocess;

int main() {

  std::string array_bin_path = "array.bin";
  const int64_t nr_rows = 28000;
  const int64_t nr_cols = 35000;
  const int64_t array_size = nr_rows * nr_cols * sizeof(float);
  std::cout << "array size: " << array_size << std::endl;

  // if the file already exists but the size is different, remove it
  if(fs::exists(array_bin_path))
  {
    int64_t file_size = fs::file_size(array_bin_path);
    std::cout << "file size: " << file_size << std::endl;
    if(array_size != file_size)
    {
      fs::remove(array_bin_path);
    }
  }

  // create a binary file of the required size
  if(!fs::exists(array_bin_path))
  {
    std::ofstream ofs(array_bin_path, std::ios::binary | std::ios::out | std::ios::trunc);
    ofs.seekp(array_size - 1);
    ofs.put(0);
    ofs.close();
  }

  // use boost interprocess to memory map the file
  const bi::file_mapping mapped_file(array_bin_path.c_str(), bi::read_write);
  bi::mapped_region region(mapped_file, bi::read_write);

  // get the address of the mapped region
  void * addr = region.get_address();

  const std::size_t region_size = region.get_size();
  std::cout << "region size: " << region_size << std::endl;

  // map the file content into a Eigen array
  Eigen::Map<Eigen::ArrayXXf> my_array(reinterpret_cast<float*>(addr), nr_rows, nr_cols);

  // modify the content
  std::cout << "initial array(0, 1) value: " << my_array(0, 1) << std::endl;
  my_array(0, 1) += 1.234f;
  std::cout << "final array(0, 1) value: " << my_array(0, 1) << std::endl;

  return 0;
}

It uses:

  • boost::interprocess in place of boost::iostreams because it is header-only. In addition, mapped_region is handy in case that I want to store multiple arrays on a single mapped file.
  • std::fstream to create the binary file and std::experimental::filesystem to check it.
  • Eigen::ArrayXXf as required in my question.
Jeri answered 14/7, 2018 at 16:36 Comment(1)
@ggael, do you have comments on the above solution?Jeri
Y
5

So i googled

boost memory mapped file

and came upon boost::iostreams::mapped_file in the first result.

Combined with the link to Eigen::Map from this comment i tested the following:

#include <boost/iostreams/device/mapped_file.hpp>
#include <Eigen/Dense>
boost::iostreams::mapped_file file("foo.bin");

const std::size_t rows = 163840;
const std::size_t columns = 163840;
if (rows * columns * sizeof(float) > file.size()) {
    throw std::runtime_error("file of size " + std::to_string(file.size()) + " couldn’t fit float Matrix of " + std::to_string(rows) + "×"  + std::to_string(columns));
}

Eigen::Map<Eigen::MatrixXf> matrix(reinterpret_cast<float*>(file.data()), rows, columns);

std::cout << matrix(0, 0) << ' ' << matrix(rows - 1, columns - 1) << std::endl;
matrix(0, 0) = 0.5;
matrix(rows - 1, columns - 1) = 0.5;

using

find_package(Boost REQUIRED COMPONENTS iostreams)
find_package(Eigen3 REQUIRED)
target_link_libraries(${PROJECT_NAME} Boost::iostreams Eigen3::Eigen)

Then i googled

windows create dummy file

and the first result gave me

fsutil file createnew foo.bin 107374182400

Running the program twice gives:

0 0

0.5 0.5

without blowing up memory usage.

So it works like a charm.

Yesman answered 10/7, 2018 at 2:50 Comment(0)
A
1

I think it wouldn't be that hard to write your own class for this.

To initialize the array for the first time, create a file of size x * y * elem_size and memory map it.

You could even add a small header with information such as size, x, y, etc. - so that if you reopen those you have all the info you need.

Now you have one big memory block and you could use a member function elem(x,y) or get_elem() / set_elem() or use the [] operator, and in that function calculate the position of the data element.

Closing the file, or committing in between, will save the data.

For really large files it could be better to map only portions of the file when they are needed to avoid the creation of a very large page table.

Windows specific (not sure if those are available in Linux):

  • If you don't need to keep the data on disk, you can open the file with the delete on close flag. This will only (temporary) write to disk if memory becomes unavailable.

  • For sparse arrays, a sparse file could be used. Those files only use disk space for the blocks that contain data. All other blocks are virtual and default to all zeros.

Allimportant answered 10/7, 2018 at 3:46 Comment(0)
J
0

Based on this comment and these answers (https://mcmap.net/q/1519893/-eigen-and-huge-dense-2d-arrays and https://mcmap.net/q/1519893/-eigen-and-huge-dense-2d-arrays), this is my working solution:

#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/mapped_region.hpp>
#include <Eigen/Dense>
#include <iostream>
#include <fstream>
#include <filesystem>

namespace fs = std::experimental::filesystem;
namespace bi = boost::interprocess;

int main() {

  std::string array_bin_path = "array.bin";
  const int64_t nr_rows = 28000;
  const int64_t nr_cols = 35000;
  const int64_t array_size = nr_rows * nr_cols * sizeof(float);
  std::cout << "array size: " << array_size << std::endl;

  // if the file already exists but the size is different, remove it
  if(fs::exists(array_bin_path))
  {
    int64_t file_size = fs::file_size(array_bin_path);
    std::cout << "file size: " << file_size << std::endl;
    if(array_size != file_size)
    {
      fs::remove(array_bin_path);
    }
  }

  // create a binary file of the required size
  if(!fs::exists(array_bin_path))
  {
    std::ofstream ofs(array_bin_path, std::ios::binary | std::ios::out | std::ios::trunc);
    ofs.seekp(array_size - 1);
    ofs.put(0);
    ofs.close();
  }

  // use boost interprocess to memory map the file
  const bi::file_mapping mapped_file(array_bin_path.c_str(), bi::read_write);
  bi::mapped_region region(mapped_file, bi::read_write);

  // get the address of the mapped region
  void * addr = region.get_address();

  const std::size_t region_size = region.get_size();
  std::cout << "region size: " << region_size << std::endl;

  // map the file content into a Eigen array
  Eigen::Map<Eigen::ArrayXXf> my_array(reinterpret_cast<float*>(addr), nr_rows, nr_cols);

  // modify the content
  std::cout << "initial array(0, 1) value: " << my_array(0, 1) << std::endl;
  my_array(0, 1) += 1.234f;
  std::cout << "final array(0, 1) value: " << my_array(0, 1) << std::endl;

  return 0;
}

It uses:

  • boost::interprocess in place of boost::iostreams because it is header-only. In addition, mapped_region is handy in case that I want to store multiple arrays on a single mapped file.
  • std::fstream to create the binary file and std::experimental::filesystem to check it.
  • Eigen::ArrayXXf as required in my question.
Jeri answered 14/7, 2018 at 16:36 Comment(1)
@ggael, do you have comments on the above solution?Jeri

© 2022 - 2024 — McMap. All rights reserved.