...regarding execution time and / or memory.
If this is not true, prove it with a code snippet. Note that speedup by vectorization does not count. The speedup must come from apply
(tapply
, sapply
, ...) itself.
...regarding execution time and / or memory.
If this is not true, prove it with a code snippet. Note that speedup by vectorization does not count. The speedup must come from apply
(tapply
, sapply
, ...) itself.
The apply
functions in R don't provide improved performance over other looping functions (e.g. for
). One exception to this is lapply
which can be a little faster because it does more work in C code than in R (see this question for an example of this).
But in general, the rule is that you should use an apply function for clarity, not for performance.
I would add to this that apply functions have no side effects, which is an important distinction when it comes to functional programming with R. This can be overridden by using assign
or <<-
, but that can be very dangerous. Side effects also make a program harder to understand since a variable's state depends on the history.
Edit:
Just to emphasize this with a trivial example that recursively calculates the Fibonacci sequence; this could be run multiple times to get an accurate measure, but the point is that none of the methods have significantly different performance:
fibo <- function(n) {
if ( n < 2 ) n
else fibo(n-1) + fibo(n-2)
}
system.time(for(i in 0:26) fibo(i))
# user system elapsed
# 7.48 0.00 7.52
system.time(sapply(0:26, fibo))
# user system elapsed
# 7.50 0.00 7.54
system.time(lapply(0:26, fibo))
# user system elapsed
# 7.48 0.04 7.54
library(plyr)
system.time(ldply(0:26, fibo))
# user system elapsed
# 7.52 0.00 7.58
Edit 2:
Regarding the usage of parallel packages for R (e.g. rpvm, rmpi, snow), these do generally provide apply
family functions (even the foreach
package is essentially equivalent, despite the name). Here's a simple example of the sapply
function in snow
:
library(snow)
cl <- makeSOCKcluster(c("localhost","localhost"))
parSapply(cl, 1:20, get("+"), 3)
This example uses a socket cluster, for which no additional software needs to be installed; otherwise you will need something like PVM or MPI (see Tierney's clustering page). snow
has the following apply functions:
parLapply(cl, x, fun, ...)
parSapply(cl, X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
parApply(cl, X, MARGIN, FUN, ...)
parRapply(cl, x, fun, ...)
parCapply(cl, x, fun, ...)
It makes sense that apply
functions should be used for parallel execution since they have no side effects. When you change a variable value within a for
loop, it is globally set. On the other hand, all apply
functions can safely be used in parallel because changes are local to the function call (unless you try to use assign
or <<-
, in which case you can introduce side effects). Needless to say, it's critical to be careful about local vs. global variables, especially when dealing with parallel execution.
Edit:
Here's a trivial example to demonstrate the difference between for
and *apply
so far as side effects are concerned:
df <- 1:10
# *apply example
lapply(2:3, function(i) df <- df * i)
df
# [1] 1 2 3 4 5 6 7 8 9 10
# for loop example
for(i in 2:3) df <- df * i
df
# [1] 6 12 18 24 30 36 42 48 54 60
Note how the df
in the parent environment is altered by for
but not *apply
.
snowfall
package and trying the examples in their vignette. snowfall
builds on top of the snow
package and abstracts the details of parallelization even further making it dead simple to execute parallelized apply
functions. –
Artiste foreach
has since become available and seems to be much inquired about on SO. –
Aden parallel
that is basically a re-factored version of snow
---so Snow-like semantics will be available by default. –
Artiste lapply(2:3, function(i) df <- df * i)
I get a different output than the one in the post (i.e., a list of vectors). Could you double check? –
Adjectival lapply
is "a little faster" than a for
loop. However, there, I am not seeing anything suggesting so. You only mention that lapply
is faster than sapply
, which is a well known fact for other reasons (sapply
tries to simplify the output and hence has to do a lot of data size checking and potential conversions). Nothing related to for
. Am I missing something? –
Denary Sometimes speedup can be substantial, like when you have to nest for-loops to get the average based on a grouping of more than one factor. Here you have two approaches that give you the exact same result :
set.seed(1) #for reproducability of the results
# The data
X <- rnorm(100000)
Y <- as.factor(sample(letters[1:5],100000,replace=T))
Z <- as.factor(sample(letters[1:10],100000,replace=T))
# the function forloop that averages X over every combination of Y and Z
forloop <- function(x,y,z){
# These ones are for optimization, so the functions
#levels() and length() don't have to be called more than once.
ylev <- levels(y)
zlev <- levels(z)
n <- length(ylev)
p <- length(zlev)
out <- matrix(NA,ncol=p,nrow=n)
for(i in 1:n){
for(j in 1:p){
out[i,j] <- (mean(x[y==ylev[i] & z==zlev[j]]))
}
}
rownames(out) <- ylev
colnames(out) <- zlev
return(out)
}
# Used on the generated data
forloop(X,Y,Z)
# The same using tapply
tapply(X,list(Y,Z),mean)
Both give exactly the same result, being a 5 x 10 matrix with the averages and named rows and columns. But :
> system.time(forloop(X,Y,Z))
user system elapsed
0.94 0.02 0.95
> system.time(tapply(X,list(Y,Z),mean))
user system elapsed
0.06 0.00 0.06
There you go. What did I win? ;-)
*apply
is faster. But I think that the more important point is the side effects (updated my answer with an example). –
Derickderide data.table
is even faster and I think "easier". library(data.table)
dt<-data.table(X,Y,Z,key=c("Y,Z"))
system.time(dt[,list(X_mean=mean(X)),by=c("Y,Z")])
–
Seamy tapply
is a specialized function for a specific task, that's why it's faster than a for loop. It can't do what a for loop can do (while regular apply
can). You're comparing apples with oranges. –
Limitary tapply
is suited for a task that might be done with nested for loops, and hence performs this a lot faster than a naive approach that does not use the tools made for the task. –
Flourish tapply
is basically split
followed by an lapply
. lapply
is not any faster than a for loop, so assuming your for loop code is optimal (I didn't check) any speedup you get from tapply
is from split
. So all this answer does then is just show that split
has a fast C implementation, and is not something I would categorize as an "apply family speedup". This is why this is apple vs oranges - at best you're comparing split
vs for loop. –
Limitary for
loop, I agree. But people don't. They either use two nested for loops, or use tapply
(or more modern approaches) if they're more familiar with R. So I check performance at user level, not internally. In any case, I thought the "tongue in cheek" was obvious with the last line of my very old answer. –
Flourish for
notwithstanding) –
Lenin ...and as I just wrote elsewhere, vapply is your friend! ...it's like sapply, but you also specify the return value type which makes it much faster.
foo <- function(x) x+1
y <- numeric(1e6)
system.time({z <- numeric(1e6); for(i in y) z[i] <- foo(i)})
# user system elapsed
# 3.54 0.00 3.53
system.time(z <- lapply(y, foo))
# user system elapsed
# 2.89 0.00 2.91
system.time(z <- vapply(y, foo, numeric(1)))
# user system elapsed
# 1.35 0.00 1.36
Jan. 1, 2020 update:
system.time({z1 <- numeric(1e6); for(i in seq_along(y)) z1[i] <- foo(y[i])})
# user system elapsed
# 0.52 0.00 0.53
system.time(z <- lapply(y, foo))
# user system elapsed
# 0.72 0.00 0.72
system.time(z3 <- vapply(y, foo, numeric(1)))
# user system elapsed
# 0.7 0.0 0.7
identical(z1, z3)
# [1] TRUE
for
loops are faster on my Windows 10, 2-core computer. I did this with 5e6
elements - a loop was 2.9 seconds vs. 3.1 seconds for vapply
. –
Finsteraarhorn I've written elsewhere that an example like Shane's doesn't really stress the difference in performance among the various kinds of looping syntax because the time is all spent within the function rather than actually stressing the loop. Furthermore, the code unfairly compares a for loop with no memory with apply family functions that return a value. Here's a slightly different example that emphasizes the point.
foo <- function(x) {
x <- x+1
}
y <- numeric(1e6)
system.time({z <- numeric(1e6); for(i in y) z[i] <- foo(i)})
# user system elapsed
# 4.967 0.049 7.293
system.time(z <- sapply(y, foo))
# user system elapsed
# 5.256 0.134 7.965
system.time(z <- lapply(y, foo))
# user system elapsed
# 2.179 0.126 3.301
If you plan to save the result then apply family functions can be much more than syntactic sugar.
(the simple unlist of z is only 0.2s so the lapply is much faster. Initializing the z in the for loop is quite fast because I'm giving the average of the last 5 of 6 runs so moving that outside the system.time would hardly affect things)
One more thing to note though is that there is another reason to use apply family functions independent of their performance, clarity, or lack of side effects. A for
loop typically promotes putting as much as possible within the loop. This is because each loop requires setup of variables to store information (among other possible operations). Apply statements tend to be biased the other way. Often times you want to perform multiple operations on your data, several of which can be vectorized but some might not be able to be. In R, unlike other languages, it is best to separate those operations out and run the ones that are not vectorized in an apply statement (or vectorized version of the function) and the ones that are vectorized as true vector operations. This often speeds up performance tremendously.
Taking Joris Meys example where he replaces a traditional for loop with a handy R function we can use it to show the efficiency of writing code in a more R friendly manner for a similar speedup without the specialized function.
set.seed(1) #for reproducability of the results
# The data - copied from Joris Meys answer
X <- rnorm(100000)
Y <- as.factor(sample(letters[1:5],100000,replace=T))
Z <- as.factor(sample(letters[1:10],100000,replace=T))
# an R way to generate tapply functionality that is fast and
# shows more general principles about fast R coding
YZ <- interaction(Y, Z)
XS <- split(X, YZ)
m <- vapply(XS, mean, numeric(1))
m <- matrix(m, nrow = length(levels(Y)))
rownames(m) <- levels(Y)
colnames(m) <- levels(Z)
m
This winds up being much faster than the for
loop and just a little slower than the built in optimized tapply
function. It's not because vapply
is so much faster than for
but because it is only performing one operation in each iteration of the loop. In this code everything else is vectorized. In Joris Meys traditional for
loop many (7?) operations are occurring in each iteration and there's quite a bit of setup just for it to execute. Note also how much more compact this is than the for
version.
2.798 0.003 2.803; 4.908 0.020 4.934; 1.498 0.025 1.528
, and vapply is even better: 1.19 0.00 1.19
–
Stater sapply
50% slower than for
and lapply
twice as fast. –
Lenin y
to 1:1e6
, not numeric(1e6)
(a vector of zeroes). Trying to allocate foo(0)
to z[0]
over and over does not illustrate well a typical for
loop usage. The message is otherwise spot on. –
Denary for
loop as the fastest, sapply
40% slower, and lapply
20% slower. –
Nim for_loop = {z <- integer(n); for(i in 1:n) z[i] = foo(y[i])}
(I think flodel has a good point above about running z[0] <- foo(0)
n times in the for loop != z <- sapply(y, foo)
), and I wrapped the lapply
in unlist()
so that the result is the same. Doing that, and then using microbenchmark I show the for
loop as more than 2x faster than lapply
, with sapply
a little bit slower than lapply
. I added vapply
too, it's about 30% slower than the loop. –
Nim lapply
is slowest and about half the speed of for
. Unfortunately, it's not because for
is so much faster (about 0.5) but lapply
got slower (about 0.9). I don't think that's an improvement overall. My 3.5.3 results from Apr. 24 were on average the best. –
Lenin When applying functions over subsets of a vector, tapply
can be pretty faster than a for loop. Example:
df <- data.frame(id = rep(letters[1:10], 100000),
value = rnorm(1000000))
f1 <- function(x)
tapply(x$value, x$id, sum)
f2 <- function(x){
res <- 0
for(i in seq_along(l <- unique(x$id)))
res[i] <- sum(x$value[x$id == l[i]])
names(res) <- l
res
}
library(microbenchmark)
> microbenchmark(f1(df), f2(df), times=100)
Unit: milliseconds
expr min lq median uq max neval
f1(df) 28.02612 28.28589 28.46822 29.20458 32.54656 100
f2(df) 38.02241 41.42277 41.80008 42.05954 45.94273 100
apply
, however, in most situation doesn't provide any speed increase, and in some cases can be even lot slower:
mat <- matrix(rnorm(1000000), nrow=1000)
f3 <- function(x)
apply(x, 2, sum)
f4 <- function(x){
res <- 0
for(i in 1:ncol(x))
res[i] <- sum(x[,i])
res
}
> microbenchmark(f3(mat), f4(mat), times=100)
Unit: milliseconds
expr min lq median uq max neval
f3(mat) 14.87594 15.44183 15.87897 17.93040 19.14975 100
f4(mat) 12.01614 12.19718 12.40003 15.00919 40.59100 100
But for these situations we've got colSums
and rowSums
:
f5 <- function(x)
colSums(x)
> microbenchmark(f5(mat), times=100)
Unit: milliseconds
expr min lq median uq max neval
f5(mat) 1.362388 1.405203 1.413702 1.434388 1.992909 100
microbenchmark
it is much more precise than system.time
. If you try to compare system.time(f3(mat))
and system.time(f4(mat))
you'll get different result almost each time. Sometimes only a proper benchmark test is able to show the fastest function. –
Crossman © 2022 - 2024 — McMap. All rights reserved.
apply
family of functions. Therefore structuring programs so they use apply allows them to be parallelized at a very small marginal cost. – Artiste