R multicore mcfork(): Unable to fork: Cannot allocate memory
Asked Answered
N

5

22

I'm getting the titular error:

mcfork(): Unable to fork: Cannot allocate memory

after trying to run a function with mcapply, but top says I'm at 51%

This is on an EC2 instance, but I do have up-to-date R.

Does anyone know what else can cause this error?

Thanks,

-N

Nondisjunction answered 27/3, 2013 at 20:35 Comment(2)
R requires contiguous memory blocks. Have you restarted and tried to re-run with just minimal data?Scleroprotein
I tried with another instance with twice the memory, this solved the issue... I'd still like to understand it better though, so I'll leave the question open.Nondisjunction
M
18

The issue might be exactly what the error message suggests: there isn't enough memory to fork and create parallel processes.

R essentially needs to create a copy of everything that's in memory for each individual process (to my knowledge it doesn't utilize shared memory). If you are already using 51% of your RAM with a single process, then you don't have enough memory to create a second process since that would required 102% of your RAM in total.

Try:

  1. Using fewer cores - If you were trying to use 4 cores, it's possible you have enough RAM to support 3 parallel threads, but not 4. registerDoMC(2), for example, will set the number of parallel threads to 2 (if you are using the doMC parallel backend).
  2. Using less memory - without seeing the rest of your code, it's hard to suggest ways to accomplish this. One thing that might help is figuring out which R objects are taking up all the memory (Determining memory usage of objects?) and then removing any objects from memory that you don't need (rm(my_big_object))
  3. Adding more RAM - if all else fails, throw hardware at it so you have more capacity.
  4. Sticking to single threading - multithreaded processing in R is a tradeoff of CPU and memory. It sounds like in this case you may not have enough memory to support the CPU power you have, so the best course of action might be to just stick to a single core.
Maybellemayberry answered 23/8, 2015 at 21:45 Comment(6)
Maybe I'm wrong in some way, but I've found that you can indeed fork a process that uses 51% of your RAM without using 102% of your RAM. That is indeed (for my purposes) is part of the reason to fork, vis. the RAM can be shared until one process attempts to modify it.Enumerate
See unix.stackexchange.com/questions/155017/… for more details.Enumerate
@Mike. Just like rpierce, I don't agree with your claim, that fork needs 50% of spare RAM. Swap suffices.Outgrowth
@Adam - aha, thanks for the build. looks like it does utilize shared memory with copy on write. can we agree that the correct statement would be that it needs 51% of spare RAM + swap to fork?Maybellemayberry
@MikeMonteiro On default Ubuntu installation - yes, but this is customizable behaviour, and it might not be true for every other Linux distribution. See my own answer to this question (below) for details concerning overcommitting memory.Outgrowth
I received the same error message as the OP & got around it by using only a single core to run my code (registerDoParallel(1)).Papaveraceous
O
5

R function mcfork is only a wrapper to the syscall fork (BtW, the man page says, that this call is itself a wrapper to the clone)

I created a simple C++ program to test fork's behaviour:

#include <stdio.h>
#include <unistd.h>

#include<vector>

int main(int argc, char **argv)
{
    printf("--beginning of program\n");

    std::vector<std::vector<int> > l(50000, std::vector<int>(50000, 0));

//    while (true) {}

    int counter = 0;
    pid_t pid = fork();
    pid = fork();
    pid = fork();


    if (pid == 0)
    {
        // child process
        int i = 0;
        for (; i < 5; ++i)
        {
            printf("child process: counter=%d\n", ++counter);
        }
    }
    else if (pid > 0)
    {
        // parent process
        int j = 0;
        for (; j < 5; ++j)
        {
            printf("parent process: counter=%d\n", ++counter);
        }
    }
    else
    {
        // fork failed
        printf("fork() failed!\n");
        return 1;
    }

    printf("--end of program--\n");
    while (true) {}
    return 0;
}

First, the program allocates about 8GB data on heap. Then, it spawns 2^2^2 = 8 children via fork call and waits to be killed by the user, and enters an infinite loop to be easy to spot on task manager.

Here are my observations:

  1. For the fork to succeed, you need to have at least 51% free memory on my system, but this includes swap. You can change this by editing /proc/sys/vm/overcommit_* proc files.
  2. As expected, none of the children take more memory, so this 51% free memory remains free throughout course of the program, and all subsequent forks also don't fail.
  3. The memory is shared between the forks, so it gets reclaimed only after you killed the last child.

Memory fragmentation issue

You should not be concerned about any layer of memory fragmentation with respect to fork. R's memory fragmentation doesn't apply here, because fork operates on virtual memory. You shouldn't worry about fragmentation of physical memory, because virtually all modern operating systems use virtual memory (which consequently enables them to use swap). The only memory fragmentation that might be of issue is a fragmentation of virtual memory space, but AFAIK on Linux virtual memory space is 2^47 which is more than huge, and for many decades you should not have any problems with finding a continuous regions of any practical size.

Summary:

Make sure you have more swap then physical memory, and as long as your computations don't actually need more memory then you have in RAM, you can mcfork them as much as you want.

Or, if you are willing to risk stability (memory starvation) of the whole system, try echo 1 >/proc/sys/vm/overcommit_memory as root on linux.

Or better yet: (more safe)

echo 2 >/proc/sys/vm/overcommit_memory
echo 100 >/proc/sys/vm/overcommit_ratio

You can read more about overcommiting here: https://www.win.tue.nl/~aeb/linux/lk/lk-9.html

Outgrowth answered 20/5, 2016 at 9:35 Comment(0)
B
4

A note for those who want to use GUI such as RStudio.
If you want to take advantage of parallel processing, it is advised not to use a GUI, as that interrupts the multithreaded processes between your code and the GUI programme. Here is an excerpt from registerDoMC package help manual on R:

The multicore functionality, originally written by Simon Urbanek and subsumed in the parallel package in R 2.14.0, provides functions for parallel execution of R code on machines with multiple cores or processors, using the system fork call to spawn copies of the current process.

The multicore functionality, and therefore registerDoMC, should not be used in a GUI environment, because multiple processes then share the same GUI.

I solved a similiar error experienced by the OP by disabling registerDoMC(cores = n) when running my program using RStudio. Multiprocessing works best with base R. Hope this helps.

Buenabuenaventura answered 30/11, 2016 at 8:39 Comment(2)
This will probably be relevant to people who find the question in future, but the original Q does not specify using a GUI. Could you add a first sentence along the lines of "if you're using a GUI, then..."?Nondisjunction
@N.McA. Thanks. Update made.Buenabuenaventura
J
1

I had the same error, while using caret to train a rpart model on a system with 64 GB of memory, with parallel processing using 6 core on a 7 core machine. Changed to 5 core and the problem ran.

library(doMC)
registerDoMC(5)
Jarlath answered 31/12, 2014 at 14:31 Comment(0)
E
1

I'm running into a similar problem right now. I won't claim to know the right answer. Both of the above answers propose courses of action that may work, especially if your forks are creating additional write demands on memory at the same time. However, I have been thinking that something else might be the source of difficulty, vis. memory fragmentation. See https://raspberrypi.stackexchange.com/questions/7856/log-says-i-cant-allocate-memory-but-i-have-more-than-half-of-my-memory-free for a discussion of a case where a user in a Unix-alike sees free memory but hits an out of memory error due to memory fragmention. This seems like a likely culprit for R in particular because of R's love for contiguous blocks of RAM. Also per ?Memory-limits the requirement should be about address space rather than RAM itself - so this could be incorrect (esp on a 64-bit machine) YMMV.

Enumerate answered 10/4, 2016 at 5:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.