Why has data.table defined := rather than overloading <-?
Asked Answered
J

2

51

data.table has introduced the := operator. Why not overload <-?

Janis answered 11/8, 2011 at 21:29 Comment(5)
Let me guess: In homage to Pascal!!!Erb
I guess! We couldn't choose any operator, it was just that (fortunately) R allows := to be defined. Otherwise we could have fun and define +=, -=, ~= etc :)Janis
Could someone please explain what "overloading <-" means?Orle
@Orle "Overloading" roughly speaking meant replacing <- with another version of <- that works differently. Making <- work differently, somehow. Instead, we used a new operator, :=, for clarity amongst other reasons. Almost everything in R is a function, even <- and [ etc.Janis
@Orle But probably 'overloading' was technically incorrect, as Owen pointed out in comments. I meant it in a loose sense.Janis
A
20

I don't think there is any technical reason this should be necessary, for the following reason: := is only used inside [...] so it is always quoted. [...] goes through the expression tree to see if := is in it.

That means it's not really acting as an operator and it's not really overloaded; so they could have picked pretty much any operator they wanted. I guess maybe it looked better? Or less confusing because it's clearly not <-?

(Note that if := were used outside of [...] it could not be <-, because you can't actually overload <-. <- Doesn't evaluate its lefthand argument so it doesn't know what the type is).

Aphotic answered 11/8, 2011 at 22:11 Comment(1)
Yes, that's pretty much it. We tried <- first actually but that didn't fly because user code already used <- in j e.g. incrementing a group counter. Then we tried <<- but people already use that in j too, to assign to .GlobalEnv. So then we hit upon :=.Janis
J
32

There are two places that <- could be 'overloaded' :

x[i, j] <- value           # 1
x[i, {colname <- value}]   # 2

The first one copies the whole of x to *tmp*, changes that working copy, and assigns back to x. That's an R thing (src/main/eval.c and subassign.c) discussed recently on r-devel here. It sounded like it might be possible to change R to allow packages, or R itself, to avoid that copy to *tmp*, but isn't currently possible, IIUC.

The second one is what Owen's answer refers to, I think. If you accept that it's ok to do assignment by reference within j like that, then which operator? As per the comment to Owen's answer, <- and <<- are already used by users in j, so we hit upon :=.

Even if [<- didn't copy the whole of x, we still like := in j so we can do things like this :

DT[,{newcol1:=sum(a)
     newcol2:=a/newcol1}, by=group]

Where the new columns are added by reference to the table, and the RHS of each := is evaluated within each group. (When := within group is implemented.)


Update Oct 2012

As of 1.8.2 (on CRAN in Jul 2012), := by group was implemented for adding or updating single columns; i.e., single LHS of :=. And now in v1.8.3 (on R-Forge at the time of writing), multiple columns can be added by group; e.g.,

DT[, c("newcol1","newcol2") := .(sum(a),sum(b)), by=group]

or, perhaps more elegantly :

DT[,`:=`(newcol1=sum(a),
         newcol2=sum(b)), by=group]

But the iterative multiple RHS, envisaged for a while, where the 2nd expression could use the result from the first, isn't implemented yet (FR#1492). So this will still give an error "newcol1 not found" and need to be done in two steps :

DT[,`:=`(newcol1=sum(a),
         newcol2=a/newcol1), by=group]
Janis answered 11/8, 2011 at 23:42 Comment(2)
Just a minor thing, x[i, j] <- value isn't actually overloading <-, rather <- does what it always does by delegating to [<- (based on the expression, not the value type).Aphotic
@Aphotic Ah yes, good point. Have edited and added quotes around 'overloaded'.Janis
A
20

I don't think there is any technical reason this should be necessary, for the following reason: := is only used inside [...] so it is always quoted. [...] goes through the expression tree to see if := is in it.

That means it's not really acting as an operator and it's not really overloaded; so they could have picked pretty much any operator they wanted. I guess maybe it looked better? Or less confusing because it's clearly not <-?

(Note that if := were used outside of [...] it could not be <-, because you can't actually overload <-. <- Doesn't evaluate its lefthand argument so it doesn't know what the type is).

Aphotic answered 11/8, 2011 at 22:11 Comment(1)
Yes, that's pretty much it. We tried <- first actually but that didn't fly because user code already used <- in j e.g. incrementing a group counter. Then we tried <<- but people already use that in j too, to assign to .GlobalEnv. So then we hit upon :=.Janis

© 2022 - 2024 — McMap. All rights reserved.