Calculation time !=
Asked Answered
P

2

6

I was wondering how much faster a!=0 is than !a==0 and used the R package microbenchmark. Here's the code (reduce 3e6 and 100 if your pc is slow):

library("microbenchmark")
a <- sample(0:1, size=3e6, replace=TRUE)
speed <- microbenchmark(a != 0, ! a == 0, times=100)
boxplot(speed, notch=TRUE, unit="ms", log=F)

Everytime, I get a plot like the one below. As expected, the first version is faster (median 26 milliseconds) than the second (33 ms).

But where do these few very high values (outliers) come from? Is that some memory management effect? If I set times to 10, there are no outliers...

Edit: sessionInfo(): R version 3.1.2 (2014-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit)

computation time unequal and not_equal

Pram answered 8/12, 2014 at 11:45 Comment(1)
I don't think this is going to be easy to track down; I've seen similar results even with times=10 or so. Keep in mind thatmicrobenchmark is not bulletproof. There are a couple blogs somewhere pointing out semibugs in how it collects timing info. It may also be simply that some other "thing" happens every now and then during the normal course of R operations - a gc call, or waiting for RAM reallocation at the system level, etc. Perhaps try running a loop around system.time to see what the distribution of results is?Pectize
S
2

You say that you don't have outliers when times=10, but run microbenchmark with times=10 several times and you are likely to see the odd outlier. Here is a comparison of one run of times=100 with ten runs of times=10, which shows that outliers occur in both situations.

Depending on the size of the objects involved in the expression, I imagine outliers could arise when your machine is struggling with memory limitations, but they might also occur due to CPU strain e.g. due to non-R processes.

a <- sample(0:1, size=3e6, replace=TRUE)
speed1 <- microbenchmark(a != 0, ! a == 0, times=100)
speed1 <- as.data.frame(speed1)

speed2 <- replicate(10, microbenchmark(a != 0, ! a == 0, times=10), simplify=FALSE)
speed2 <- do.call(rbind, lapply(speed2, cbind))

times <- cbind(rbind(speed1, speed2), method=rep(1:2, each=200))
boxplot(time ~ expr + method, data=times, 
        names=c('!=; 1x100', '!==; 1x100', '!=; 10x10', '!==; 10x10'))

enter image description here

Storz answered 8/12, 2014 at 12:34 Comment(1)
At this level of resolution, I think the outliers are almost always the result of garbage collectionAltamira
N
0

I think the comparison is slighlty unfair. Of course you would get outliers, the computation time is dependent on several factors (garbage collection, cached results, etc) so it is not really a surprise. You are using the same vector a in all the benchmarks so caching certainly would play a role.

I adjusted a bit the process by randomizing the a variable prior to the computation, and I got relatively comparable results:

library("microbenchmark")
do.not<-function() {
   a <- sample(0:1, size=3e6, replace=TRUE)
   a!=0;
}

do<-function() {
   a <- sample(0:1, size=3e6, replace=TRUE)
   a==0;
}

randomize <- function() {
   a <- sample(0:1, size=3e6, replace=TRUE)
}


speed <- microbenchmark(randomize(), do.not(), do(), times=100)
boxplot(speed, notch=TRUE, unit="ms", log=F)

Boxplot

I also added the sample function as a benchmark and see how 'volatile' this is.

Personally I am not surprised on the outliers. Also, even if you run the same benchmark for size=10, then you still get outliers. They are not a consequence of the computation, but of the overall PC condition (other scripts running, memory load, etc)

Thanks

Nub answered 8/12, 2014 at 13:33 Comment(1)
You compare != and ==. The different timing is a result to an additional call to ! in !(a == 0).Microscope

© 2022 - 2024 — McMap. All rights reserved.