What's the biggest R-gotcha you've run across?
Asked Answered
P

29

58

Is there a certain R-gotcha that had you really surprised one day? I think we'd all gain from sharing these.

Here's mine: in list indexing, my.list[[1]] is not my.list[1]. Learned this in the early days of R.

Pict answered 8/10, 2009 at 0:48 Comment(3)
There are a lot more gotchas, big and small, in 'The R Inferno' burns-stat.com/pages/Tutor/R_inferno.pdfHornsby
Whitespace matters in if-else statement. Error: unexpected 'else' in "else" will pop out when you put a newline after the curly brace in the if statement: if { ... } \n else { ... }.Alevin
The choose function. choose(n, k) isn't the number of k--element subsets of an n--element set. For example, choose(-4,2) == 10.Walkout
P
32

Removing rows in a dataframe will cause non-uniquely named rows to be added, which then errors out:

> a<-data.frame(c(1,2,3,4),c(4,3,2,1))
> a<-a[-3,]
> a
  c.1..2..3..4. c.4..3..2..1.
1             1             4
2             2             3
4             4             1
> a[4,1]<-1
> a
Error in data.frame(c.1..2..3..4. = c("1", "2", "4", "1"), c.4..3..2..1. = c(" 4",  : 
  duplicate row.names: 4

So what is going on here is:

  1. A four row data.frame is created, so the rownames are c(1,2,3,4)

  2. The third row is deleted, so the rownames are c(1,2,4)

  3. A fourth row is added, and R automatically sets the row name equal to the index i.e. 4, so the row names are c(1,2,4,4). This is illegal because row names should be unique. I don't see why this type of behavior should be allowed by R. It seems to me that R should provide a unique row name.

Pluviometer answered 8/10, 2009 at 3:15 Comment(12)
Interesting. I've been using R and its S predecessors since 1988 and I'd never seen that before!Ger
Wow. That is very strange. Can you explain it?Thunderstone
So what is going on here is: 1. A four row data.frame is created, so the rownames are c(1,2,3,4) 2. The third row is deleted, so the rownames are c(1,2,4) 3. A fourth row is added, and R automatically sets the row name equal to the index i.e. 4, so the row names are c(1,2,4,4). This is illegal because row names should be unique. I don't see why this type of behavior should be allowed by R. It seems to me that R should provide a unique row name.Pluviometer
Very interesting. Two thoughts: (1) it might be clearer in the long run to edit your answer and add your explanation there and (2) have you considered emailing this into the r-devel mail list?Thunderstone
note that this is an error of print.data.frame. The code will run fine otherwise (with warnings.)Unreasonable
I suppose that I could ask r-devel, but it might get shot down with prejudice by some of the stronger personalities there. From a performance perspective, checking for uniqueness is O(n), so that might be the reason. If someone else thinks that it should go to the developer list, I'll send it.Pluviometer
O(n) worst case scenario. O(n) isn't that bad... I would send it to r-devel.Pict
@Eduardo: When you do a traceback after the error is is thrown by the data.frame functionMargrettmarguerie
This is great and yet another reason to avoid data frames, if possible. I wonder what is gained with row names, given all of the issues that crop up.Gonnella
@IanFellows just curious -- did you end up sending it to r-devel? I wouldn't blame you if you didn't.Waters
Seems fixed in R 3.3.3. Final row is now 4.1 1 NAImpeccant
@NickKennedy, my faith in humanity is restored! ~8 years after I reported it to r-devel.Pluviometer
T
43

[Hadley pointed this out in a comment.]

When using a sequence as an index for iteration, it's better to use the seq_along() function rather than something like 1:length(x).

Here I create a vector and both approaches return the same thing:

> x <- 1:10
> 1:length(x)
 [1]  1  2  3  4  5  6  7  8  9 10
> seq_along(x)
 [1]  1  2  3  4  5  6  7  8  9 10

Now make the vector NULL:

> x <- NULL
> seq_along(x) # returns an empty integer; good behavior
integer(0)
> 1:length(x) # wraps around and returns a sequence; this is bad
[1] 1 0

This can cause some confusion in a loop:

> for(i in 1:length(x)) print(i)
[1] 1
[1] 0
> for(i in seq_along(x)) print(i)
>
Thunderstone answered 23/6, 2010 at 14:22 Comment(0)
O
36

The automatic creation of factors when you load data. You unthinkingly treat a column in a data frame as characters, and this works well until you do something like trying to change a value to one that isn't a level. This will generate a warning but leave your data frame with NA's in it ...

When something goes unexpectedly wrong in your R script, check that factors aren't to blame.

Offutt answered 8/10, 2009 at 2:54 Comment(3)
Right -- but you can useoptions("stringsAsFactors"=FALSE) in your startup file(s) to change this.Entrenchment
@Dirk all well until you send a piece of code to someone with a different .Rprofile (happened to me this week ;)Langton
This actually happens not only when you read a file, but also when you use the data.frame constructor. Has bit me from behind many times as well.Revile
E
32

Forgetting the drop=FALSE argument in subsetting matrices down to single dimension and thereby dropping the object class as well:

R> X <- matrix(1:4,2)
R> X
     [,1] [,2]
[1,]    1    3
[2,]    2    4
R> class(X)
[1] "matrix"
R> X[,1]
[1] 1 2
R> class(X[,1])
[1] "integer"
R> X[,1, drop=FALSE]
     [,1]
[1,]    1
[2,]    2
R> class(X[,1, drop=FALSE])
[1] "matrix"
R> 
Entrenchment answered 8/10, 2009 at 2:2 Comment(0)
P
32

Removing rows in a dataframe will cause non-uniquely named rows to be added, which then errors out:

> a<-data.frame(c(1,2,3,4),c(4,3,2,1))
> a<-a[-3,]
> a
  c.1..2..3..4. c.4..3..2..1.
1             1             4
2             2             3
4             4             1
> a[4,1]<-1
> a
Error in data.frame(c.1..2..3..4. = c("1", "2", "4", "1"), c.4..3..2..1. = c(" 4",  : 
  duplicate row.names: 4

So what is going on here is:

  1. A four row data.frame is created, so the rownames are c(1,2,3,4)

  2. The third row is deleted, so the rownames are c(1,2,4)

  3. A fourth row is added, and R automatically sets the row name equal to the index i.e. 4, so the row names are c(1,2,4,4). This is illegal because row names should be unique. I don't see why this type of behavior should be allowed by R. It seems to me that R should provide a unique row name.

Pluviometer answered 8/10, 2009 at 3:15 Comment(12)
Interesting. I've been using R and its S predecessors since 1988 and I'd never seen that before!Ger
Wow. That is very strange. Can you explain it?Thunderstone
So what is going on here is: 1. A four row data.frame is created, so the rownames are c(1,2,3,4) 2. The third row is deleted, so the rownames are c(1,2,4) 3. A fourth row is added, and R automatically sets the row name equal to the index i.e. 4, so the row names are c(1,2,4,4). This is illegal because row names should be unique. I don't see why this type of behavior should be allowed by R. It seems to me that R should provide a unique row name.Pluviometer
Very interesting. Two thoughts: (1) it might be clearer in the long run to edit your answer and add your explanation there and (2) have you considered emailing this into the r-devel mail list?Thunderstone
note that this is an error of print.data.frame. The code will run fine otherwise (with warnings.)Unreasonable
I suppose that I could ask r-devel, but it might get shot down with prejudice by some of the stronger personalities there. From a performance perspective, checking for uniqueness is O(n), so that might be the reason. If someone else thinks that it should go to the developer list, I'll send it.Pluviometer
O(n) worst case scenario. O(n) isn't that bad... I would send it to r-devel.Pict
@Eduardo: When you do a traceback after the error is is thrown by the data.frame functionMargrettmarguerie
This is great and yet another reason to avoid data frames, if possible. I wonder what is gained with row names, given all of the issues that crop up.Gonnella
@IanFellows just curious -- did you end up sending it to r-devel? I wouldn't blame you if you didn't.Waters
Seems fixed in R 3.3.3. Final row is now 4.1 1 NAImpeccant
@NickKennedy, my faith in humanity is restored! ~8 years after I reported it to r-devel.Pluviometer
P
25

First, let me say that I understand fundamental problems of representing numbers in a binary system. Nevertheless, one problem that I think could be easily improved is the representation of numbers when the decimal value is beyond R's typical scope of presentation.

x <- 10.2 * 100
x
1020
as.integer(x)
1019

I don't mind if the result is represented as an integer when it really can be represented as an integer. For example, if the value really was 1020 then printing that for x would be fine. But something as simple as 1020.0 in this case when printing x would have made it more obvious that the value was not an integer and not representable as one. R should default to some kind of indication when there is an extremely small decimal component that isn't presented.

Photoelectron answered 3/8, 2010 at 17:4 Comment(5)
I sympathize, but this is really hard to get right. Beyond printing everything to full precision all the time, can you give an example of a language that does this better? (That's a real, not a rhetorical, question ...)Fractocumulus
I don't think that any language I use handles this great but I think R's current method is about the worst because it displays floating point as if it's integer when it's not. Simply displaying some kind of floating point would at least be better but then that might create a need for more decimal places. It would actually have been helpful to have the above as 1.0199e3. Or, in the alternate situation with something like 81.00000001 presenting 8.10e1 as a surprising outcome could hint that there are lots more decimal places. There are lots of better ways. There are fewer worse.Photoelectron
Oh, and most interpreted programming languages will just try to print the whole number if possible. They'll at least display a floating point as floating point.Photoelectron
I think this is where R's heritage as an interactive data analysis platform bites us. It makes some sense to try to limit the number of digits displayed, but it's tricky to see how it could be done. One perhaps-overly-clever alternative (which will never be implemented) would be to always print floating point values with at least the decimal point, i.e. 1.000000000001 would print as 1.; the other alternative would be to print an explict L after integers, but that would be ugly.Fractocumulus
I think I'd prefer to have the number presented as an int, the way it is now, if it really is representable as such, even though the current class is actually fp. But, if it's very very close to an int, but not quite, put up the ".". R already works out rounding of the numbers in presentation anyway. It already is capable of knowing that the number was not an exact int and presenting it as if it were. It's not much extra to ask for an indication of whether it really is exact or not.Photoelectron
H
20

It can be annoying to have to allow for combinations of NA, NaN and Inf. They behave differently, and tests for one won't necessarily work for the others:

> x <- c(NA,NaN,Inf)
> is.na(x)
[1]  TRUE  TRUE FALSE
> is.nan(x)
[1] FALSE  TRUE FALSE
> is.infinite(x)
[1] FALSE FALSE  TRUE

However the safest way to test any of these trouble-makers is:

> is.finite(x)
[1] FALSE FALSE FALSE
Hit answered 14/8, 2010 at 13:36 Comment(1)
interesting... I always thought of NA as "I don't know (yet)". but my interpretation does not fit with is.infinite(NA) and is.finite(NA) returning FALSE: I had expected NA.Puree
T
18

Always test what happens when you have an NA!

One thing that I always need to pay careful attention to (after many painful experiences) is NA values. R functions are easy to use, but no manner of programming will overcome issues with your data.

For instance, any net vector operation with an NA is equal to NA. This is "surprising" on the face of it:

> x <- c(1,1,2,NA)
> 1 + NA
[1] NA
> sum(x)
[1] NA
> mean(x)
[1] NA

This gets extrapolated out into other higher-level functions.

In other words, missing values frequently have as much importance as measured values by default. Many functions have na.rm=TRUE/FALSE defaults; it's worth spending some time deciding how to interpret these default settings.

Edit 1: Marek makes a great point. NA values can also cause confusing behavior in indexes. For instance:

> TRUE && NA
[1] NA
> FALSE && NA
[1] FALSE
> TRUE || NA
[1] TRUE
> FALSE || NA
[1] NA

This is also true when you're trying to create a conditional expression (for an if statement):

> any(c(TRUE, NA))
[1] TRUE
> any(c(FALSE, NA))
[1] NA
> all(c(TRUE, NA))
[1] NA

When these NA values end up as your vector indexes, many unexpected things can follow. This is all good behavior for R, because it means that you have to be careful with missing values. But it can cause major headaches at the beginning.

Thunderstone answered 8/10, 2009 at 3:36 Comment(1)
It pains in subscripting, eg. (1:3)[c(TRUE,FALSE,NA)] gives 1,NA. It is easy to trap in this when you create logical vector on NA-contained vector (1:3)[c(1,2,NA)<2].Sallysallyann
E
13

Forgetting that strptime() and friends return POSIXt POSIXlt where length() is always nine -- converting to POSIXct helps:

R> length(strptime("2009-10-07 20:21:22", "%Y-%m-%d %H:%M:%S"))
[1] 9
R> length(as.POSIXct(strptime("2009-10-07 20:21:22", "%Y-%m-%d %H:%M:%S")))
[1] 1
R> 
Entrenchment answered 8/10, 2009 at 2:27 Comment(1)
...it now returns a length of 1 in R 2.14.0 (and probably some earlier versions too)...Semblance
T
13

The round function always rounds to the even number.

> round(3.5)
[1] 4  

> round(4.5)
[1] 4
Trireme answered 26/5, 2011 at 16:58 Comment(2)
I'm pretty sure that is floating-point gotcha, not a R gotcha.Sallysallyann
Wikipedia has some useful information on this type of rounding, which is attributed to IEEE 754 floating point specs.Gonnella
S
12

Math on integers is subtly different from doubles (and sometimes complex is weird too)

UPDATE They fixed some things in R 2.15

1^NA      # 1
1L^NA     # NA
(1+0i)^NA # NA 

0L %/% 0L # 0L  (NA from R 2.15)
0 %/% 0   # NaN
4L %/% 0L # 0L  (NA from R 2.15)
4 %/% 0   # Inf
Semblance answered 10/8, 2011 at 19:3 Comment(0)
S
11

I'm surprised that no one mention this but:

T & F can be override, TRUE & FALSE don't.

Example:

x <- sample(c(0,1,NA), 100, T)
T <- 0:10

mean(x, na.rm=T)
# Warning in if (na.rm) x <- x[!is.na(x)] :
#   the condition has length > 1 and only the first element will be used
# Calls: mean -> mean.default
# [1] NA

plot(rnorm(7), axes=T)
# Warning in if (axes) { :
#   the condition has length > 1 and only the first element will be used
# Calls: plot -> plot.default
# Warning in if (frame.plot) localBox(...) :
#   the condition has length > 1 and only the first element will be used
# Calls: plot -> plot.default

[edit] ctrf+F trick me. Shane mention about this in his comment.

Sallysallyann answered 27/5, 2011 at 9:30 Comment(1)
And the obvious fun corollary: put T <- FALSE ; F <- TRUE inside someone's ~/.RprofileRosanarosane
S
8

Reading in data can be more problematic than you may think. Today I found that if you use read.csv(), if a line in the .csv file is blank, read.csv() automatically skips it. This makes sense for most applications, but if you're automatically extracting data from (for example) row 27 from several thousand files, and some of the preceding rows may or may not be blank, if you're not careful things can go horribly wrong.

I now use

data1 <- read.table(file_name, blank.lines.skip = F, sep = ",")

When you're importing data, check that you're doing what you actually think you're doing again and again and again...

Sheri answered 2/11, 2011 at 10:49 Comment(0)
E
8

The tricky behaviour of the all.equal() function.

One of my continuos errors is comparing a set of floating point numbers. I have a CSV like:

... mu,  tau, ...
... 0.5, 1.7, ...

Reading the file and trying to subset the data sometimes works, sometimes fails - of course, due to falling into the pits of the floating point trap again and again. At first, the data contains only integer values, then later on it always transforms into real values, you know the story. Comparing should be done with the all.equal() function instead of the == operator, but of course, the code I first wrote used the latter approach.

Yeah, cool, but all.equal() returns TRUE for equal numbers, but a textual error message if it fails:

> all.equal(1,1)
[1] TRUE
> all.equal(1:10, 1:5)
[1] "Numeric: lengths (10, 5) differ"
> all.equal(1:10, c(1:5,1:5))
[1] "Mean relative difference: 0.625"

The solution is using isTRUE() function:

if (!isTRUE(all.equal(x, y, tolerance=doubleErrorRate))) {
    ...
}

How many times had I got to read the all.equals() description...

Exocarp answered 10/2, 2012 at 9:54 Comment(0)
E
7

This one hurt so much that I spent hours adding comments to a bug-report. I didn't get my wish, but at least the next version of R will generate an error.

R> nchar(factor(letters))
 [1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Update: As of R 3.2.0 (probably earlier), this example now generates an error message. As mentioned in the comments below, a factor is NOT a vector and nchar() requires a vector.

R> nchar(factor(letters))
Error in nchar(factor(letters)) : 'nchar()' requires a character vector
R> is.vector(factor(letters))
[1] FALSE
Erigeron answered 21/2, 2012 at 18:45 Comment(4)
Interesting. Can you explain why this happens? It seems to me that it's counting the characters in the numeric representation, i.e. 9 values of 1 (1 digit), 17 with 2 characters (for 2 digits).Gonnella
What I learned is that a factor is not a vector, and 'nchar' only works on vectors.Erigeron
This comment comes super after-hours, but the above comment is not right. factor(letters) may not be a vector, but it can be treated as such, you can see it as a vector of factors. The first comment is close to what is happening here -- internally, factors are integers: typeof(factor(letters)). So, this output is the same as nchar(1:length(letters)). 1 when you have one digit, 2 for two digits.Cassandracassandre
Not sure what you mean: nchar() does not treat factor(letters) as a vector, nor is the output the same as nchar(1:length(letters)). A long time ago, yes, but not anymore.Erigeron
W
6
  1. accidentally listing source code of a function by forgetting to include empty parentheses: e.g. "ls" versus "ls()"

  2. true & false don't cut it as pre-defined constants, like in Matlab, C++, Java, Python; must use TRUE & FALSE

  3. invisible return values: e.g. ".packages()" returns nothing, while "(.packages())" returns a character vector of package base names

Willawillabella answered 8/10, 2009 at 1:54 Comment(2)
R does auto-complete though, so na.rm=T works as well as na.rm=TRUE. I always prefer the latter though for readability.Pict
Well, these aren't strictly equivalent. You can overwrite T, but TRUE is set. Try the following to confirm: { T <- FALSE; T }. Very dangerous! So, Stuart is right: you need be careful with your true/false values.Thunderstone
B
5

For instance, the number 3.14 is a numerical constant, but the expressions +3.14 and -3.14 are calls to the functions + and -:

> class(quote(3.14))
[1] "numeric"
> class(quote(+3.14))
[1] "call"
> class(quote(-3.14))
[1] "call"

See Section 13.2 in John Chambers book Software for Data Analysis - Programming with R

Beerbohm answered 6/11, 2009 at 12:4 Comment(1)
this is interesting, but has it ever made trouble for you in a real-life example?Fractocumulus
S
5

Partial matching in the $ operator: This applies to lists, but also on data.frame

df1 <- data.frame(foo=1:10, foobar=10:1)
df2 <- data.frame(foobar=10:1)

df1$foo # Correctly gets the foo column
df2$foo # Expect NULL, but this returns the foobar column!!!

# So, should use double bracket instead:
df1[["foo"]]
df2[["foo"]]

The [[ operator also has an exact flag, but it is thankfully TRUE by default.

Partial matching also affects attr:

x1 <- structure(1, foo=1:10, foobar=10:1)
x2 <- structure(2, foobar=10:1)

attr(x1, "foo") # Correctly gets the foo attribute
attr(x2, "foo") # Expect NULL, but this returns the foobar attribute!!!

# So, should use exact=TRUE
attr(x1, "foo", exact=TRUE)
attr(x2, "foo", exact=TRUE)
Semblance answered 10/8, 2011 at 18:52 Comment(0)
M
5

Automatic repeating of vectors ("recycling") used as indices:

R> all.numbers <- c(1:5)
R> all.numbers
[1] 1 2 3 4 5
R> good.idxs <- c(T,F,T)
R> #note unfortunate length mismatch
R> good.numbers <- all.numbers[good.idxs]
R> good.numbers
[1] 1 3 4
R> #wtf? 
R> #why would you repeat the vector used as an index 
R> #without even a warning?
Metzgar answered 30/12, 2011 at 23:4 Comment(3)
Nice find. I don't think it's ever come up in my work, and I work with a lot of logical indices... I don't plan to program around this issue, but it would be interesting to know where others might see this arise.Gonnella
This kind of vector recycling is quite common in R. I feel it is a mixed blessing and potential source of of unanticipated behavior.Diadem
It's called recycling, or vector recycling, as @PaulHiemstra noted, and it's intentional, even if the results are sometimes wack.Rosanarosane
E
5

Zero-length vectors have some quirks:

R> kk=vector(mode="numeric",length=0)
R> kk
numeric(0)
R> sum(kk)
[1] 0
R> var(kk)
[1] NA
Erigeron answered 20/2, 2012 at 16:53 Comment(1)
Note that prod(numeric(0))==1 too. I'm sure this has been discussed before on the r mailing lists, but it's a good point.Fractocumulus
S
4

Working with lists, there are a couple of unintuitive things:

Of course, the difference between [ and [[ takes some getting used to. For lists, the [ returns a list of (potentially 1) elements whereas the [[ returns the element inside the list.

List creation:

# When you're used to this:
x <- numeric(5) # A vector of length 5 with zeroes
# ... this might surprise you
x <- list(5)    # A list with a SINGLE element: the value 5
# This is what you have to do instead:
x <- vector('list', 5) # A vector of length 5 with NULLS

So, how to insert NULL into a list?

x <- list("foo", 1:3, letters, LETTERS) # A sample list
x[[2]] <- 1:5        # Put 1:5 in the second element
# The obvious way doesn't work: 
x[[2]] <- NULL       # This DELETES the second element!
# This doesn't work either: 
x[2] <- NULL       # This DELETES the second element!

# The solution is NOT very intuitive:
x[2] <- list(NULL) # Put NULL in the second element

# Btw, now that we think we know how to delete an element:
x <- 1:10
x[[2]] <- NULL  # Nope, gives an ERROR!
x <- x[-2]    # This is the only way for atomic vectors (works for lists too)

Finally some advanced stuff like indexing through a nested list:

x <- list(a=1:3, b=list(c=42, d=13, e="HELLO"), f='bar')
x[[c(2,3)]] # HELLO (first selects second element and then it's third element)
x[c(2,3)]   # The second and third elements (b and f)
Semblance answered 11/8, 2011 at 16:27 Comment(4)
x[[2]] <- 1:5 put 1:5 in second element. And to extend your answer x[1:2] <- 1:2 put 1 in first element and 2 in second, x[1,2] works for nested list (second element of first element)Sallysallyann
@Sallysallyann - Well, I didn't find x[[2]] <- 1:5 and x[1:2] <- 1:2 that surprising. x[1,2] should be x[[c(1,2)]] and I updated the answer. Thanks!Semblance
x[2] <- 1:5 gives me warning and put 1 in second element of x. And I was wrong in my comment: I have on my mind difference between x[c(1,2)] (return 1st and 2nd element) and x[[c(1,2)]] (return 2nd element of 1st element).Sallysallyann
Oops, I see. So we were both partially wrong ;-) I've updated the answer again.Semblance
A
4

One of the big confusion in R is that [i, drop = TRUE] does drop factor levels, but [i, j, drop = TRUE] does not!

> df = data.frame(a = c("europe", "asia", "oceania"), b = c(1, 2, 3))
> df$a[1:2, drop = TRUE]
[1] europe asia  
Levels: asia europe          <---- drops factor levels, works fine
> df[1:2,, drop = TRUE]$a
[1] europe asia  
Levels: asia europe oceania  <---- does not drops factor levels!

For more info see: drop = TRUE doesn't drop factor levels in data.frame while in vector it does

Alevin answered 2/1, 2013 at 15:41 Comment(0)
I
3

Coming from compiled language and Matlab, I've gotten occasionally confused about a fundamental aspect of functions in functional languages: they have to be defined before they're used! It's not enough just for them to be parsed by the R interpreter. This mostly rears its head when you use nested functions.

In Matlab you can do:

function f1()
  v1 = 1;
  v2 = f2();
  fprintf('2 == %d\n', v2);

  function r1 = f2()
    r1 = v1 + 1 % nested function scope
  end
end

If you try to do the same thing in R, you have to put the nested function first, or you get an error! Just because you've defined the function, it's not in the namespace until it's assigned to a variable! On the other hand, the function can refer to a variable that has not been defined yet.

f1 <- function() {
  f2 <- function() {
    v1 + 1
  }

  v1 <- 1

  v2 = f2()

  print(sprintf("2 == %d", v2))
}
Imam answered 18/10, 2009 at 14:9 Comment(1)
"The function can refer to a variable that has not been defined yet": To an object that has not been defined yet, including a function! If you change v1+1 to f3() in your example, and then define an f3 function before f2 gets called, it still works fine.Czech
I
3

Mine from today: qnorm() takes Probabilities and pnorm() takes Quantiles.

Igenia answered 3/8, 2010 at 15:50 Comment(2)
because the names denote what they return, not what they take. Would you prefer sin() take ratios?Photoelectron
Take it easy on me, it was my first submission :)Igenia
N
3

For me it is the counter intuitive way in which when you export a data.frame to a text file using write.csv, then to import it afterwards you need to add an additional argument to get exactly the same data.frame, like this:

write.csv(m, file = 'm.csv')
read.csv('m.csv', row.names = 1) # Note the row.names argument

I also posted this question in SO and was suggested as an answer to this Q by @BenBolker.

Nick answered 20/9, 2012 at 13:3 Comment(0)
D
1

The apply set of functions does not only work for matrices, but scales up to multi-dimensional array's. In my research I often have a dataset of for example temperature of the atmosphere. This is stored in a multi-dimensional array with dimensions x,y,level,time, from now on called multi_dim_array. A mockup example would be:

multi_dim_array = array(runif(96 * 48 * 6 * 100, -50, 50), 
                        dim = c(96, 48, 6, 100))
> str(multi_dim_array)
#     x     y     lev  time    
 num [1:96, 1:48, 1:6, 1:100] 42.4 16 32.3 49.5 24.9 ...

Using apply one can easily get the:

# temporal mean value
> str(apply(multi_dim_array, 4, mean))
 num [1:100] -0.0113 -0.0329 -0.3424 -0.3595 -0.0801 ...
# temporal mean value per gridcell (x,y location)
> str(apply(multi_dim_array, c(1,2), mean))
 num [1:96, 1:48] -1.506 0.4553 -1.7951 0.0703 0.2915 ...
# temporal mean value per gridcell and level (x,y location, level)
> str(apply(multi_dim_array, c(1,2,3), mean))
 num [1:96, 1:48, 1:6] -3.839 -3.672 0.131 -1.024 -2.143 ...
# Spatial mean per level
> str(apply(multi_dim_array, c(3,4), mean))
 num [1:6, 1:100] -0.4436 -0.3026 -0.3158 0.0902 0.2438 ...

This makes the margin argument to apply seem much less counter intuitive. I first though, why not use "row" and "col" instead of 1 and 2. But the fact that it also works for array's with more dimensions makes it clear why using margin like this is preferred.

Diadem answered 6/11, 2012 at 8:7 Comment(0)
A
0

which.min and which.max function opposite expectations when using comparison operator and can even give incorrect answers. So for example trying to figure out which element in a list of sorted numbers is the largest number that is less than a threshold. (i.e. in a sequence from 100 to 200 which is the largest number that is less than 110)

set.seed(420)
x = seq(100, 200)
which(x < 110)
> [1]  1  2  3  4  5  6  7  8  9 10
which.max(x < 110)
> [1] 1
which.min(x < 110)
> [1] 11
x[11]
> [1] 110
max(which(x < 110))
>[1] 10
x[10]
> [1] 109
Apostil answered 22/11, 2020 at 22:15 Comment(0)
A
-1

The dirties gotcha that can be really hard to find! Cutting multi-line expressions like this one:

K <- hyperpar$intcept.sigma2
        + cov.NN.additive(x1$env, x2 = NULL, sigma2_int = hyperpar$env.sigma2_int, sigma2_slope = hyperpar$env.sigma2_slope)
        + hyperpar$env.sigma2 * K.cache$k.env

R will only evaluate the first line, and the other two will just go waste! And it will not say any warning, nothing! This is pretty nasty treachery on unsuspecting user. It must actually be written like this:

K <- hyperpar$intcept.sigma2 +
        cov.NN.additive(x1$env, x2 = NULL, sigma2_int = hyperpar$env.sigma2_int, sigma2_slope = hyperpar$env.sigma2_slope) +
        hyperpar$env.sigma2 * K.cache$k.env

which is not quite natural way of writing.

Alevin answered 8/2, 2020 at 21:16 Comment(0)
A
-1

This one!

all(c(1,2,3,4) == NULL)
$[1] TRUE

I had this check in my code, I really need both tables to have the same column names:

stopifnot(all(names(x$x$env) == names(x$obsx$env)))

But the check passed (evaluated to TRUE) when x$x$env didn't even exist!

Alevin answered 20/2, 2020 at 10:43 Comment(0)
A
-1

You can use options(warn = 2), which, according to the manual:

If warn is two or larger all warnings are turned into errors.

Indeed, the warnings are turned into errors, but, gotcha! The code still continues running after such errors!!!

source("script.R")
# ...
# Loading required package: bayesmeta
# Failed with error:  ‘(converted from warning) there is no package called ‘bayesmeta’’
# computing posterior (co)variances ... 
# (script continues running)
...

PS: but some other errors converted from warning do stop the script... so I don't know, I am confused. This one did stop the script:

Error in optimise(psiline, c(0, 2), adiff, a, as.matrix(K), y, d0, mn,  :
  (converted from warning) NA/Inf replaced by maximum positive value
Alevin answered 28/2, 2020 at 12:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.