R system() process always uses same CPU, not multi-threaded/multi-core
Asked Answered
P

2

10

In R 3.0.2 on Linux 3.12.0, I am using the system() function to execute a number of tasks. The desired effect is for each of these tasks to run as they would if I had executed them on the command-line via Rscript outside of R system().

However, when executing them inside R via system(), each task is tied to the same single CPU from the master R process.

In other words:

When launched via RScript directly from a bash shell, outside of R, each task runs on its own core as possible (this is desired)

When launched inside R via system(), each task runs on the same single core. There is no multicore sharing. If I have 100 tasks, they are all stuck on one core.

I cannot figure out how to spawn a process inside of R so that each process will use its own core.

I am using a simple test to consume CPU cycles so I can measure the effect using top/htop:

dd if=/dev/urandom bs=32k count=1000 | bzip2 -9 >> /dev/null

When this simple test is launched outside of R multiple times, each iteration gets its own core. But when I launch it inside of R:

system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)

They are all stuck on a single core.

Here is a visualization after running 4 simultaneous/concurrent iterations of system().

enter image description here

Please help me, I need to be able to tell R to launch new tasks, with each of them running in their own core.

UPDATE DEC 4 2013:

I tried a test in Python using this:

import thread
thread.start_new_thread(os.system,("/bin/dd if=/dev/urandom of=/dev/null bs=32k count=2000",))

I repeated the new thread several times, and as expected everything worked (multiple cores used, one per thread).

So I think install the rPython package in R, and try the same from within R:

python.exec("import thread")
python.exec("thread.start_new_thread(os.system,('/bin/dd if=/dev/urandom of=/dev/null bs=32k count=2000',))")

Unfortunately, once again it was limited to a single core even after repeated calls. Why is it that everything launched is limited to a single core when executed from R?

Preglacial answered 2/12, 2013 at 9:7 Comment(9)
I think impossible without using add-on package or at least parallel package. You find here more explanations.Mortie
Have you tried GNU parallel on your system? Or perhaps if you are running 4 processes you could try using xargs in your launch script with the P - 4 '4 maxprocs' option to try and force parallel execution??Danley
@agstudy, I have tried parallel package. I couldn't even get that to work correctly, so I don't know if somehow my Debian install of R 3.0.2 x64 is somehow hosed, or what. Parallel still was limited to a single core.Preglacial
@StephenHenderson, sorry mate, I don't see how either of those would work in this case. The actual commands I am generating with system() are each unique.Preglacial
OK if they are genuinely unique e.g. diff commands it can't work but often one runs through files running the same command e.g zipping (your example) them or similar in which case you can replace a loop with a parallel or xargs -P command on the list of filenames. That said I never tried it I gen don't know if it works...I have though run multiple Rscripts in parallel from a bash shell.Danley
@StephenHenderson, I want to make sure we are on the same page. Running them in parallel has never been an issue. The issue is they are all stuck on a single core. Each system() needs to run in its own core where possible, because they are CPU intensive calls. If you look at my example, the four calls to system() are running concurrently. The issue is, they are only occupying one core instead of four. They would occupy all four outside of R.Preglacial
Also I am totally fine with using a bash workaround. Meaning, if there is a linux/bash command that could "break me out" of this single threaded CPU/core hell I am in, then I could configure R to call a bash script via system(), with arguments in that script passed on to whatever magic mechanism that can break me out of this limitation. I haven't found anything so far.Preglacial
I understand you. If you say calling GNU parallel the bash utility doesn't run them in parallel then fine - I'm sorry I couldn't help.Danley
Just wanted to mention that I am experiencing this same problem on R 4.0.2. I suspect it has something to do with my installation, because I am seeing the same exact behavior as documented in this question. I am going to try to install R 4.0.3 to see if it helps. #65598589Luxemburg
B
7

Following on @agstudy's comment, you should get parallel to work first. On my system, this uses multiple cores:

f<-function(x)system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
library(parallel)
mclapply(1:4,f,mc.cores=4)

I would have wrote this in a comment myself, but it is too long. I know you have said that you have tried the parallel package, but I wanted to confirm that you are using it correctly. If it doesn't work, can you confirm that a non-system call uses mclapply correctly, like this one?

a<-mclapply(rep(1e8,4),rnorm,mc.cores=4)

Reading your comments, I suspect that your pthreads Linux package is out of date and broken. On my system, I am using libpthread-2.15.so (not 2.13). If you're on Ubuntu, you can grab the latest with apt-get install libpthread-stubs0.

Also, note that you should be using parallel, not multicore. If you look at the docs for parallel, you'll note that they have incorporated the work on multicore.


Reading your next set of comments, I must insist that it is parallel and not multicore that has been included in R since 2.14. You can read about this on the CRAN Task View.

Getting parallel to work is crucial. I previously told you that you could compile it directly from source, but this is not correct. I guess the only way to recompile it would be to compile R from source.

Can you also verify that your CPU affinity is set correctly? Also can you check if R can detect the number of cores? Just run:

library(parallel)
mcaffinity()
# Should be c(1,2,3,4) for you.
detectCores()
# Should be 4 for you.
Bartholomeo answered 4/12, 2013 at 17:54 Comment(15)
Hi, thanks for trying. The first block of code spawns four processes, I can see them on htop etc. But locked to just a single core like the example screenshot. Your second example (non-system call) also used just 100% of a single core. So now that we've got this new info, can you give me insight as to why my parallel library is not working?Preglacial
New info... just tried it via CLI R instead of RStudio, and it gave me a segfault. Here is the pastebin: pastebin.com/1SWhH4Zd -- in addition, I checked the kern log and found this: [586018.637080] rsession[28883]: segfault at 7f1e1eeda9d0 ip 00007f1e23912d8c sp 00007fff484ab730 error 4 in libpthread-2.13.so[7f1e2390b000+17000]. I did a quick google but not seeing anyone else with this issue with R. But it appears to be the culprit, if I only knew why.Preglacial
I just removed the parallel package, multicore, foreach, doParallel, doSNOW. Then I reinstalled multicore, which I think is what R 3.0.2 should use instead of parallel. It includes mclapply. Still no joy, same segfault at the CLI and single core. I cannot find anyone else that has run into this issue so not sure where to go next.Preglacial
@user1530260 I have updated my answer, I suspect your pthreads is broken.Bartholomeo
I tried all the pthread's I could find in aptitude already, thinking same thing when I saw the segfault. No change, I reinstalled everything one at a time. I am running Debian 7 Wheezy yet am only getting libpthread-2.13.so it seems.As for R and multicore/parallel package, I thought it was the other way around since multicore comes with R 3 and parallel gives the error: "package ‘parallel’ is not available (for R version 3.0.2)" upon install.packages(). I had managed have it installed previously, probably from source, but can't find the source package again (yet) to try it again.Preglacial
I just did a complete apt-get purge ^r-.* on all installed packages and dependencies, manually killed the /usr/local/R and /usr/lib/R directories, rebooted, and then reinstalled. The package "parallel" is now back as pre-installed, but... NO JOY! The test mclapply() you provided still is stuck at one single core and causes a segfault.Preglacial
@user1530260 I updated my answer, suggesting that you recompile parallel from source.Bartholomeo
parallel package was back already after I purged everything and reinstalled from scratch, but either way it gives an error: > install.packages('parallel',type='source') Warning in install.packages : package ‘parallel’ is not available (for R version 3.0.2)Preglacial
I am going to deploy a new physical server tomorrow with new everything, and see if that fixes the problem. Can you tell me where libpthread is listed as a prerequisite for the parallel package? Are we sure it is required, because I cannot find anything linking the two aside from the segfault kern.log message.Preglacial
So, I actually don't know for sure if libpthread is a requirement. I have updated my comments, and removed my incorrect technique of getting parallel reinstalled.Bartholomeo
I was unable to build the new physical server yet due to winter storm and UPS couldn't deliver (and probably can't until Tuesday now). Since it has become clear that the issue is with my configuration, somehow, I am going to award you the +50 bounty (sorry, I don't have a lot of credit). Thank you for helping me uncover this and prove it is capable of working -- that is the important thing in the long run.Preglacial
@user1530260 Thanks for the bounty, but I'd rather see this question resolved. Maybe you could update later and see if the problem was CPU affinity, as I mentioned in my last edit?Bartholomeo
It is not affinity, everything works fine as per your example.Preglacial
Just a final follow-up -- I built the new physical server today, and everything worked as expected. I do not know why the old config won't work, but it is somehow corrupted. For now I am moving on. Thx for your help. As a final note, I did not install any libpthread package, just the standard apt-get install r-base-dev after adding the repo's. Thx again.Preglacial
Could be related to OpenBlas. It is possible that the different R sessions called OpenBlas, itself diverting all workload to a single core. See grokbase.com/t/r/r-sig-hpc/124qe5gmwn/parallel-and-openblasPerdue
L
2

I tested running:

system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)

on Linux 2.6.32 with R 3.0.2 and on Linux 3.8.0 with R 2.15.2. In both cases it takes up 4 CPU cores (as you would expect).

-- Edit --

I installed Linux 3.12 on a Virtual Box machine, and here R 3.0.2 also does what I expect: Takes up 4 CPUs. It even slowly wanders between the CPUs - so each process does not stick to the same CPU but changes every second or so.

This leads me to believe your system as some local modifications that forces R to use only one CPU.

From your description I would guess the local modifications are in R and not system wide (since your Python has no problems spawning more processes).

The modifications could be on your user alone, so create a new user and try with that. If it works for the new user, we need to figure out what your userid has installed.

If it does not work for the new user, it could be globally installed R libraries that causes the problem. Install an older R version and try that out. If the older version works, your R 3.0.2 installation is probably broken. Remove it and re-install it.

Leakage answered 5/12, 2013 at 2:33 Comment(4)
I already completely reinstalled R from scratch after a complete purge. It made no difference. I am currently in the process of deploying a brand new physical server and will see if that solves it. Will take another couple days due to time constraints on my side. I cannot imagine what "modification" has occurred though in this particular config that is causing it to be bound to a single CPU.Preglacial
would you please confirm whether or not you did any special apt-get install's for libpthread or etc, or just used library(parallel) which is built-in for R 3.0.2? Can you show me a list of installed packages with pthread in name, because there does seem to be some sort of link on my system - the segfault of libpthread.so + the limit to one core must be related.Preglacial
I used the CRAN version of R. I did not do any special install of pthreads and did not use library(parallel). All I did was: Add CRAN to sources.list; apt-get install r-base-core; R (enter) Paste the 4 lines.Leakage
I was unable to build the new physical server yet due to winter storm and UPS couldn't deliver (and probably can't until Tuesday now). Since it has become clear that the issue is with my configuration, somehow, I will continue working to that end (new system, reinstallation etc). Thank you for helping prove that it works, that is what is important.Preglacial

© 2022 - 2024 — McMap. All rights reserved.