multithreading with R?
Asked Answered
D

4

64

Reading the R-project website, there are some (unclear) references to multithreading with R, but it is unclear how the base product and CRAN libraries are compiled.

Revolution Analytics offers multithreaded base(?) download for Windows and Redhat.

Would some of the other Linux distributions also include multithreaded R (and packages)?

Diminuendo answered 31/5, 2012 at 13:59 Comment(3)
Where are the unclear references you reference? And Revolution R uses a multi-threaded BLAS. You're free to use whatever BLAS you want, including multi-threaded BLAS, you just have to follow the instructions.Seigler
Partially my question was if Linux distros ship with multitheaded R (libraries? as clarified below ...)?Diminuendo
Debian (and Ubuntu) give you Open BLAS which is pretty good, as well as Atlas (which is good but is typically built in a single-threaded configuration).Lying
L
78

You are confused.

The R (and before it, S) internals are single-threaded, and will almost surely remain single-threaded. As I understand it, Duncan Temple Lang's PhD work was about overcoming this, and if he can't do it...

That said, there are pockets of multi-threadedness:

  • First off, whenever you make external calls, and with proper locking, you can go multi-threaded. That is what the BLAS libraries MKL, Goto/Open BLAS, Atlas (if built multithreaded), ... all offer. Revo R "merely" ships with (Intel's) MKL as Intel happens to be a key Revo investor

  • If you are careful about what you do, you can use OpenMP (a compiler extension for multi-threading). This started with Luke Tierney's work on pnmath and pnmath0 (which used to be experimental / external packages) and has since been coming into R itself, slowly but surely.

  • Next, in a multicore world, and on the right operating system, you can always fork(). That is what package multicore pioneered and which package parallel now carries on.

  • Last but not least there is the network / RPC route with MPI used by packages like Rmpi, snow, parallel, ... and covered in HPC introductions.

Lying answered 31/5, 2012 at 14:5 Comment(9)
OK, so at developer.r-project.org/TODO-DTL.html it states: "Luke and I are working on getting concurrency and potentially parallelism in R so that one can (at least appear to) be executing different commands simultaneously. Ideally we will be able to exploit multiple processors in a machine and run certain computations in parallel." So if I'm calling out to to some(?) correctly compiled libraries, the processing is parallel, but it isn't when running "built-in" commands?Diminuendo
Did you also see the 'Last modified' time stamp in the bottom corner?Lying
Just noticing it ... Somebody with the right permissions might want to move the page to the archive folder :)Diminuendo
I love that website, but I learned the hard way not to expect any firm time lines or deliveries :)Lying
What about mran.microsoft.com/open (R open). They specifically refer to being multithreaded as one of the top features?Honeydew
Still only in the BLAS/LAPACK libraries. I made the same clarifying comment already below (and it got three upvotes).Lying
There have been a number of "developments" since this question and answer was first posed; though, not the type of developments that you'd have hoped for, at least in terms of Revolution R: "In January 2015 Microsoft rebranded and renewed several Revolution Analytics products and offerings for Hadoop, Teradata Database, SUSE Linux, Red Hat, and Microsoft Windows. Microsoft made several of these R-based products free of charge for developers..." en.m.wikipedia.org/wiki/Revolution_Analytics Oh yeah, it was renamed blog.revolutionanalytics.com/2016/01/microsoft-r-open.htmlMcinnis
@DirkEddelbuettel has anything your answer addresses changed in the years since you wrote it?Helvetii
Not to the core.Lying
B
3

Renjin is an JVM based implementation of the interpreter. They claim that:

Unlike GNU R, Renjin is multithreaded and will run happily in a Platform-as-a-Service environment such as Google Appengine, AWS Elastic Beanstalk, Heroku or Microsoft Azure.

#resource http://www.bedatadriven.com/products/renjin.html

Still, the actual R packages we would call from R may not be thread safe.

See Jep documentation explaining this issue from standpoint of calling CPython from Java/Scala.

https://github.com/ninia/jep/wiki/How-Jep-Works#threading-complications

Due to complications and limitations of JNI, a thread that creates a Jep instance must be reused for all method calls to that Jep instance. Jep will enforce this and throw exceptions mentioning invalid thread access. (In the future we hope to simplify or provide utilities for thread management).

More than one Jep instance should not be run on the same thread at the same time. While this is technically allowed, it can potentially mess up the thread state and lead to deadlock in the Python interpreter. This will probably be changed to throw an exception if encountered in the future.

So, there seems to be hope with Renjin but actual binary (C/C++, etc) packages used need to be verified for thread safety.

There are other R implementations

https://dynamicecology.wordpress.com/2014/01/14/r-isnt-just-r-anymore/

Barragan answered 2/12, 2017 at 17:56 Comment(1)
Renjin has less JNI related (multi-threading) complications since it is not using JNI but translates all C/C++/Fortran code to JVM bytecode.Auroraauroral
L
2

What about this? Since the modification date of that page is in May 2014, I think the mentioned packages are relatively new, or maybe those haven't been stable at the time the first answer has been written.

Lemures answered 13/11, 2014 at 1:50 Comment(1)
After two and a half years, Dirk's answer still stands. R is not inherently parallel, even if it has libraries and functions that are. So, if you use a loop or an apply function, it is not automatically parallelized. However, you can parallelize much of your code by taking advantage of parallelized functions and libraries (e.g. the package parallel, which was provided with base R as of 2.14.0 (Oct 2011).Salt
R
0

You can effectively multi-thread R by using KNIME or any other program that utilizes the rserve.exe executable. In KNIME, you can put an R Snippet within a Parallel Chunk node series for operations done row-wise. For column-wise operations, you can split the data set into subsets of columns and execute R Snippets on each set, then merge them back together.

I hope this makes your CPU fan spin faster!

Rebroadcast answered 17/5, 2017 at 15:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.