How does subsetting with NA work?
Asked Answered
C

1

8

Can someone please answer in layman terms how indexing (subsetting) with NA works. Even though there are some answers from google, I would like to understand it better in simple terms.

When indexing a vector (of length > 1) using a single NA, why does it yield five missing values?

> x <- 1:5
> x[NA]
[1] NA NA NA NA NA
Causeway answered 28/8, 2018 at 14:43 Comment(4)
Because there are some things in R that have "always been", even if they don't necessarily make sense in current perspectives given modern expectations of vectors, indexing, etc. (To me, this behavior is not intuitive, I have just accepted it as "the way it is with R".)Hitandrun
Great question. I have never been confronted with this. I think if you subset with NA there is always something broken :-) that needs to be catched.Apollo
Not necessarily "broken", @AndreElrico! I have plenty of data where NA is perfectly meaningful and intentionally retained. It's the equivalent of SQL's null. There are times where I knowingly have an NA in an index vector and subset based on it, knowing (relying on, even) that it will give me an NA in that location. (In that case, though, it's typically NA_integer_ and not the elusive logical that defaults with [NA].)Hitandrun
Related dupe-oids: Why does xNA yield an NA vector the same length as x?; Indexing with NA; NA in subsetter - Inconsistent behaviorSatirist
P
9

From help("["):

When extracting, a numerical, logical or character NA index picks an unknown element and so returns NA in the corresponding element of a logical, integer, numeric, complex or character result, and NULL for a list.

What does "corresponding element" mean? This can be understood if you know about recycling of vector elements. x[NA] (this is a logical NA per default) in your example is actually "interpreted" as x[c(NA, NA, NA, NA, NA)] since logical indices are recycled. So, each element of x has a corresponding NA during subsetting and thus (per the quote above) NA is returned for each element of x. In layman's language: For each element of x we don't know if we want it. Thus an unknown value is returned for each element.

As @r2evans points out: x[NA_integer_] returns only one NA because integer indices are not recycled. In layman's language: We want one value but don't know which one. Thus, one unknown value is returned.

Prepossessing answered 28/8, 2018 at 14:51 Comment(6)
Ahhhh ... so it might make more sense to view the differences between x[NA] and x[NA_integer_], where the first is actually class logical, where x[T] and x[F] behavior (returning 5 elements) actually makes sense. Nice, Roland.Hitandrun
@Hitandrun highly valuable comment.Apollo
This is what made sense to me. Let me know if I am correct. Logical vectors get recycled. So NA gets recycled for 5 times. With each NA there is an associated NA value in x. SO 5 NA is returned for each of 5 NA in x.Causeway
@Hitandrun x[F] doesn't return 5 elements but integer(0) (in this case)Beria
Another puzzle : names(x) <- NA; identical(names(x)[1],NA_character_); x[NA_character_]Beria
@Moody_Mudskipper, you're right ... I hastily said x[F] recycled when it doesn't. (hang-head-low). Uh, yeah, R's dealing with vectors and logicals and stuff can sometimes seem inconsistent or counter-intuitive ... but I'm guessing it makes sense to somebody and/or from a certain perspective. In retrospect, it makes perfect sense to me that x[F] returns integer(0): if x[c(T,F,F,...)] returns just the first element, then I would expect x[c(F,F,F,...)] to return no elements ... so even if it is recycling, it makes sense.Hitandrun

© 2022 - 2024 — McMap. All rights reserved.