How do Rpy2, pyrserve and PypeR compare?
Asked Answered
B

4

67

I would like to access R from within a Python program. I am aware of Rpy2, pyrserve and PypeR.

What are the advantages or disadvantages of these three options?

Blazer answered 12/4, 2011 at 4:32 Comment(8)
Duplicate of #2573632Undercut
Actually, the question is NOT a dublicate, however the answer to the Python 3.1.1 question does answer the OP's question.Toffee
On second thought, this is not a duplicate at all. pyrserve and PypeR are not discussed in the other question, so I'll vote to reopen if it gets closed. Good question!Uranology
There is no comparison of these packages at that link. Only a mention that they exist (and that Rpy2 was incompatible with Python 3.x when the question was asked, but it is compatible now).Blazer
As of November 2012, I think the community has settled on rpy2 (and the r magic in ipython.) My personal experience with rpy2 has been very positive.Blazer
I'm looking for any commentary on the threading / distributed computing capabilities. I'm gearing up to use R from a IPython cluster that will delegate the triggering of algorithm implementation in R and pull the results back into a report generated in Python. As of this moment PyRserve looks more in line for that sort of task than rpy2 but I could be wrong.Vaclav
@Vaclav did you go with PyRServe in the end? I'm gearing up to do something similar - I dispatch Celery tasks to worker servers, which then sometimes have R scripts to run - and I'm wondering do I run them on the Celery workers through RPy2, or dispatch them to PyRServe? In theory it seems, RPy2 is better for large data and tasks involving R and Python, and PyRServe is better where you have smaller data being transferred (due to the piping) and/or little interaction between R and Python, in theory. Any discoveries you made would be highly appreciated!Heriberto
@Heriberto this work got delayed unfortunately, there's a ton of exciting stuff like these questions that are pending. I'll hit you up if we get back on track to this work in the next few months. We left things at RPy2 since IPython created entirely new Python processes for these distributed tasksVaclav
D
43

I know one of the 3 better than the others, but in the order given in the question:

rpy2:

  • C-level interface between Python and R (R running as an embedded process)
  • R objects exposed to Python without the need to copy the data over
  • Conversely, Python's numpy arrays can be exposed to R without making a copy
  • Low-level interface (close to the R C-API) and high-level interface (for convenience)
  • In-place modification for vectors and arrays possible
  • R callback functions can be implemented in Python
  • Possible to have anonymous R objects with a Python label
  • Python pickling possible
  • Full customization of R's behavior with its console (so possible to implement a full R GUI)
  • MSWindows with limited support

pyrserve:

  • native Python code (will/should/may work with CPython, Jython, IronPython)
  • use R's Rserve
  • advantages and inconveniences linked to remote computation and to RServe

pyper:

  • native Python code (will/should/may work with CPython, Jython, IronPython)
  • use of pipes to have Python communicate with R (with the advantages and inconveniences linked to it)

edit: Windows support for rpy2

Ducat answered 13/4, 2011 at 1:19 Comment(0)
B
17

From the paper in the Journal of Statistical Software on PypeR:

RPy presents a simple and efficient way of accessing R from Python. It is robust and very convenient for frequent interaction operations between Python and R. This package allows Python programs to pass Python objects of basic data types to R functions and return the results in Python objects. Such features make it an attractive solution for the cases in which Python and R interact frequently. However, there are still limitations of this package as listed below.
Performance:
RPy may not behave very well for large-size data sets or for computation-intensive duties. A lot of time and memory are inevitably consumed in producing the Python copy of the R data because in every round of a conversation RPy converts the returned value of an R expression into a Python object of basic types or NumPy array. RPy2, a recently developed branch of RPy, uses Python objects to refer to R objects instead of copying them back into Python objects. This strategy avoids frequent data conversions and improves speed. However, memory consumption remains a problem. [...] When we were implementing WebArray (Xia et al. 2005), an online platform for microarray data analysis, a job consumed roughly one quarter more computational time if running R through RPy instead of through R's command-line user interface. Therefore, we decided to run R in Python through pipes in subsequent developments, e.g., WebArrayDB (Xia et al. 2009), which retained the same performance as achieved when running R independently. We do not know the exact reason for such a difference in performance, but we noticed that RPy directly uses the shared library of R to run R scripts. In contrast, running R through pipes means running the R interpreter directly.
Memory:
R has been denounced for its uneconomical use of memory. The memory used by large- size R objects is rarely released after these objects are deleted. Sometimes the only way to release memory from R is to quit R. RPy module wraps R in a Python object. However, the R library will stay in memory even if the Python object is deleted. In other words, memory used by R cannot be released until the host Python script is terminated.
Portability:
As a module with extensions written in C, the RPy source package has to be compiled with a specific R version on POSIX (Portable Operating System Interface for Unix) systems, and the R must be compiled with the shared library enabled. Also, the binary distributions for Windows are bound to specic combinations of different versions of Python/R, so it is quite frequent that a user has difficulty in finding a distribution that ts the user's software environment.

Bus answered 12/4, 2011 at 16:31 Comment(3)
Not sure that this is an unbiased comparison. May be I'll bias it the other way around to balance then ;-). Beside a possible confusion between rpy and rpy2 (the name changes throughout the text), the implementation specificity (C vs pure R) appears to me the only valid point.The alleged issue with memory usage results of failing to call garbage collection explicitly (something known to happen with Python). There are plenty of examples were pyper can be shown to be both consuming more memory (by design) and orders of magnitude slower.Ducat
@Igautier -- Do you have any links to examples where Rpy2 is significantly faster or more memory efficient than pyper?Blazer
@Blazer -- any example involving numpy computation and R computation using the same vector/array/matrix is required... almost any example where an object is accessed and modified by both R and Python actuallyDucat
L
7

From a developer's prospective, we used to use rpy/rpy2 to provide statistical and drawing functions to our Python-based application. It has caused huge problems in delivering our application because rpy/rpy2 needs to be compiled for specific combinations of Python and R, which makes it infeasible for us to provide binary distributions that work out of box unless we bundle R as well. Because rpy/rpy2 are not particularly easy to install, we ended up replacing relevant parts with native Python modules such as matplotlib. We would have switched to pyrserve if we had to use R because we could start a R server locally and connect to it without worrying about the version of R.

Lager answered 11/3, 2015 at 15:12 Comment(1)
Thanks for the info on this, although of us its in a server application this is a real concernLighterage
P
5

in pyper, i can't pass large matrix from python to r instance with assign(). however, i don't have issue with rpy2. it is just my experience.

Proletariat answered 29/11, 2012 at 22:57 Comment(2)
I am having problems with pyper too when passing large matrices from Python to R. The data seems to "go through" but the R computations I do crash randomly. I wonder if there is a bug or a limitation in what pyper can handle.Houphouetboigny
Exact same problem here, both on a pimped out macbook pro and insane bioinformatics server. This is despite the authors claiming that pypeR is robust in their paper. I have a pip installable pre-alpha alternative called widediaper available at github.com/endrebak/widediaper but it is still very alpha. Of course, it is so very simple that it just might work; it sure does for me.Pence

© 2022 - 2024 — McMap. All rights reserved.