`levels<-`( What sorcery is this?
Asked Answered
S

4

122

In an answer to another question, @Marek posted the following solution: https://mcmap.net/q/182592/-idiom-for-ifelse-style-recoding-for-multiple-categories

dat <- structure(list(product = c(11L, 11L, 9L, 9L, 6L, 1L, 11L, 5L, 
                                  7L, 11L, 5L, 11L, 4L, 3L, 10L, 7L, 10L, 5L, 9L, 8L)), .Names = "product", row.names = c(NA, -20L), class = "data.frame")

`levels<-`(
  factor(dat$product),
  list(Tylenol=1:3, Advil=4:6, Bayer=7:9, Generic=10:12)
  )

Which produces as output:

 [1] Generic Generic Bayer   Bayer   Advil   Tylenol Generic Advil   Bayer   Generic Advil   Generic Advil   Tylenol
[15] Generic Bayer   Generic Advil   Bayer   Bayer  

This is just the printout of a vector; so to store it, you can do the even more confusing:

res <- `levels<-`(
  factor(dat$product),
  list(Tylenol=1:3, Advil=4:6, Bayer=7:9, Generic=10:12)
  )

Clearly this is some kind of call to the levels function, but I have no idea what's being done here. What is the term for this kind of sorcery, and how do I increase my magical ability in this domain?

Signe answered 4/5, 2012 at 13:3 Comment(9)
There is also names<- and [<-.Flagelliform
Also, I wondered about this on the other question but didn't ask: is there any reason for the structure(...) construct instead of just data.frame(product = c(11L, 11L, ..., 8L))? (If there's some magic happening there, I'd like to wield it too!)Flagelliform
It's a call to the "levels<-" function: function (x, value) .Primitive("levels<-"), sort of like X %in% Y is an abbreviation for "%in%"(X, Y).Furculum
@dbaupp I just used dput to output an object I'd created by subsetting my actual data, and dput returns structure calls by default.Signe
@gsk3, Cool, didn't know about dput, thanks!Flagelliform
@dbaupp Very handy for reproducible examples: #5963769Signe
I have no idea why someone voted to close this as not constructive? The Q has a very clear answer: what is the meaning of the syntax used in the example and how does this work in R?Maganmagana
Still shocked at how good all the answers are. Thanks everyone. Learned a lot.Signe
See also https://mcmap.net/q/182593/-what-are-replacement-functions-in-rSigne
M
113

The answers here are good, but they are missing an important point. Let me try and describe it.

R is a functional language and does not like to mutate its objects. But it does allow assignment statements, using replacement functions:

levels(x) <- y

is equivalent to

x <- `levels<-`(x, y)

The trick is, this rewriting is done by <-; it is not done by levels<-. levels<- is just a regular function that takes an input and gives an output; it does not mutate anything.

One consequence of that is that, according to the above rule, <- must be recursive:

levels(x)[1] <- "a"

is

levels(x) <- `[<-`(levels(x), 1, "a")

is

x <- `levels<-`(x, `[<-`(levels(x), 1, "a"))

It's kind of beautiful that this pure-functional transformation (up until the very end, where the assignment happens) is equivalent to what an assignment would be in an imperative language. This construct in functional languages is called a lens. Lenses can be awkward to use in some programming languages, but in R they just work.

But then, once you have defined replacement functions like levels<-, you get another, unexpected windfall: you don't just have the ability to make assignments, you have a handy function that takes in a factor, and gives out another factor with different levels. There's really nothing "assignment" about it!

So, the code you're describing is just making use of this other interpretation of levels<-. I admit that the name levels<- is a little confusing because it suggests an assignment, but this is not what is going on. The code is simply setting up a sort of pipeline:

  • Start with dat$product

  • Convert it to a factor

  • Change the levels

  • Store that in res

Personally, I think that line of code is beautiful ;)

Mikey answered 8/5, 2012 at 2:40 Comment(1)
For completeness, this is also described (somewhat formally) in the R language definition: cran.r-project.org/doc/manuals/R-lang.html#Subset-assignmentMadsen
I
34

No sorcery, that's just how (sub)assignment functions are defined. levels<- is a little different because it is a primitive to (sub)assign the attributes of a factor, not the elements themselves. There are plenty of examples of this type of function:

`<-`              # assignment
`[<-`             # sub-assignment
`[<-.data.frame`  # sub-assignment data.frame method
`dimnames<-`      # change dimname attribute
`attributes<-`    # change any attributes

Other binary operators can be called like that too:

`+`(1,2)  # 3
`-`(1,2)  # -1
`*`(1,2)  # 2
`/`(1,2)  # 0.5

Now that you know that, something like this should really blow your mind:

Data <- data.frame(x=1:10, y=10:1)
names(Data)[1] <- "HI"              # How does that work?!? Magic! ;-)
Interwork answered 4/5, 2012 at 13:10 Comment(3)
Can you explain a little more about when it makes sense to call functions that way, rather than the usual way? I am working through @Marek's example in the linked question, but it would help to have a more explicit explanation.Dannadannel
@DrewSteen: for code clarity/readability reasons, I would say it never makes sense because `levels<-`(foo,bar) is the same as levels(foo) <- bar. Using @Marek's example: `levels<-`(as.factor(foo),bar) is the same as foo <- as.factor(foo); levels(foo) <- bar.Interwork
Nice list. Don't you think levels<- is really just shorthand for attr<-(x, "levels") <- value, or at least it probably was until it was turned into a primitive and handed over to C-code.Math
S
31

The reason for that "magic" is that the "assignment" form must have a real variable to work on. And the factor(dat$product) wasn't assigned to anything.

# This works since its done in several steps
x <- factor(dat$product)
levels(x) <- list(Tylenol=1:3, Advil=4:6, Bayer=7:9, Generic=10:12)
x

# This doesn't work although it's the "same" thing:
levels(factor(dat$product)) <- list(Tylenol=1:3, Advil=4:6, Bayer=7:9, Generic=10:12)
# Error: could not find function "factor<-"

# and this is the magic work-around that does work
`levels<-`(
  factor(dat$product),
  list(Tylenol=1:3, Advil=4:6, Bayer=7:9, Generic=10:12)
  )
Softener answered 4/5, 2012 at 13:23 Comment(2)
+1 I think it would be cleaner to convert to factor first, then replace the levels via a within() and transform() call were the thusly modified object is returned and assigned.Maganmagana
@GavinSimpson - I agree, I only explain the magic, I don't defend it ;-)Softener
M
17

For user-code I do wonder why such language manipulations are used so? You ask what magic is this and others have pointed out that you are calling the replacement function that has the name levels<-. For most people this is magic and really the intended use is levels(foo) <- bar.

The use-case you show is different because product doesn't exist in the global environment so it only ever exists in the local environment of the call to levels<- thus the change you want to make does not persist - there was no reassignment of dat.

In these circumstances, within() is the ideal function to use. You would naturally wish to write

levels(product) <- bar

in R but of course product doesn't exist as an object. within() gets around this because it sets up the environment you wish to run your R code against and evaluates your expression within that environment. Assigning the return object from the call to within() thus succeeds in the properly modified data frame.

Here is an example (you don't need to create new datX - I just do that so the intermediary steps remain at the end)

## one or t'other
#dat2 <- transform(dat, product = factor(product))
dat2 <- within(dat, product <- factor(product))

## then
dat3 <- within(dat2, 
               levels(product) <- list(Tylenol=1:3, Advil=4:6, 
                                       Bayer=7:9, Generic=10:12))

Which gives:

> head(dat3)
  product
1 Generic
2 Generic
3   Bayer
4   Bayer
5   Advil
6 Tylenol
> str(dat3)
'data.frame':   20 obs. of  1 variable:
 $ product: Factor w/ 4 levels "Tylenol","Advil",..: 4 4 3 3 2 1 4 2 3 4 ...

I struggle to see how constructs like the one you show are useful in the majority of cases - if you want to change the data, change the data, don't create another copy and change that (which is all the levels<- call is doing after all).

Maganmagana answered 4/5, 2012 at 14:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.