Assign multiple columns using := in data.table, by group - McMap

About

Assign multiple columns using := in data.table, by group

Asked 27/7, 2012 at 2:13 Answered 1/4, 2018 at 11:37

Solved r dataframe data.table variable-assignment colon-equals

A

2

175

What is the best way to assign to multiple columns using data.table? For example:

f <- function(x) {c("hi", "hello")}
x <- data.table(id = 1:10)

I would like to do something like this (of course this syntax is incorrect):

x[ , (col1, col2) := f(), by = "id"]

And to extend that, I may have many columns with names stored in a variable (say col_names) and I would like to do:

x[ , col_names := another_f(), by = "id", with = FALSE]

What is the correct way to do something like this?

Amoroso answered 27/7, 2012 at 2:13 Comment(3)

This looks like it has been answered: #11309254 – Amoroso 27/7, 2012 at 20:52

Alex, That answer is close but it doesn't seem to work in combination with by as @Christoph_J is correct to say. Link to your question added to FR#2120 "Drop needing with=FALSE for LHS of :=", so it won't get forgotten to revisit. – Syncretism 8/8, 2012 at 15:29

To be clear, f() is a function returning multiple values, one for each of your columns. – Ostler 4/5, 2018 at 6:10

S

196

This now works in v1.8.3 on R-Forge. Thanks for highlighting it!

x <- data.table(a = 1:3, b = 1:6) 
f <- function(x) {list("hi", "hello")} 
x[ , c("col1", "col2") := f(), by = a][]
#    a b col1  col2
# 1: 1 1   hi hello
# 2: 2 2   hi hello
# 3: 3 3   hi hello
# 4: 1 4   hi hello
# 5: 2 5   hi hello
# 6: 3 6   hi hello

x[ , c("mean", "sum") := list(mean(b), sum(b)), by = a][]
#    a b col1  col2 mean sum
# 1: 1 1   hi hello  2.5   5
# 2: 2 2   hi hello  3.5   7
# 3: 3 3   hi hello  4.5   9
# 4: 1 4   hi hello  2.5   5
# 5: 2 5   hi hello  3.5   7
# 6: 3 6   hi hello  4.5   9 

mynames = c("Name1", "Longer%")
x[ , (mynames) := list(mean(b) * 4, sum(b) * 3), by = a]
#     a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27

x[ , get("mynames") := list(mean(b) * 4, sum(b) * 3), by = a][]  # same
#    a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27

x[ , eval(mynames) := list(mean(b) * 4, sum(b) * 3), by = a][]   # same
#    a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27

Older version using the with argument (we discourage this argument when possible):

x[ , mynames := list(mean(b) * 4, sum(b) * 3), by = a, with = FALSE][] # same
#    a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27

Syncretism answered 6/10, 2012 at 8:48 Comment(15)

Thanks for this answer and the examples. How should I modify the following line in order to get two columns for each objectName from the dim output, rather than one column with two rows? data.table(objectName=ls())[,c("rows","cols"):=dim(get(objectName)),by=objectName] (I'm using data.table 1.8.11) – Enrol 19/5, 2014 at 2:0

@Enrol dim returns a vector so converting that to type list should rotate it; e.g. [,c("rows","cols"):=as.list(dim(get(objectName))),by=objectNa‌me]. Trouble is that as.list has call overhead and also copies the small vector. If efficiency is a problem as number of groups rises then please let us know. – Syncretism 21/5, 2014 at 11:49

Thanks @Matt_Dowle. I had tried list but not as.list. Speed isn't an issue. Just wanted a quick way to find objects in the environment that had a certain number of columns or rows. This is off topic, but... what do you think about adding NCOL to tables()? – Enrol 23/5, 2014 at 0:34

@Enrol Great idea, done. Just committed to v1.9.3 on R-Forge. – Syncretism 25/5, 2014 at 0:20

Hi Matt. The first example in your second code block (i.e. x[,mynames:=list(mean(b)*4,sum(b)*3),by=a,with=FALSE][]) now throws a warning, so maybe remove it? On a related note, has anyone suggested that, with options(datatable.WhenJisSymbolThenCallingScope=TRUE), an assignment like x[,mynames:=list(mean(b)*4,sum(b)*3),by=a] should in fact work? Seems like that would be consistent with the other changes, though I guess it might break too much existing user code (?). – Plastic 29/11, 2016 at 17:49

@JoshO'Brien Matt removed that option in response to a bug shortly after your comment, fyi github.com/Rdatatable/data.table/issues/1952 – Reichel 10/2, 2017 at 21:2

@MattDowle can you make it such that this syntax would work too? x[, .(Col1, Col2) := 'somevalue'] This is particularly useful because you can then use RStudio's autocomplete for column names as you type them. – Cancan 29/11, 2017 at 2:37

@Cancan Seems like a good idea, should be possible. Please file a feature request on GitHub here - thanks. – Syncretism 29/11, 2017 at 2:44

I don't think by=a is necessary; not sure why it would be. I tried and it worked. – Owades 23/2, 2018 at 20:33

@PanFrancisco Without by=a it will work, but return a different answer. The mean(a) and sum(a) aggregates are being recycled within each group when by=a. Without by=a it just sticks the mean and sum for the entire column into each cell (i.e. different numbers). – Syncretism 24/2, 2018 at 0:44

@MattDowle what if my function already returns named list, is there anyway I can add the columns to the dt without having to name them again? e.g. f <- function(x) {list("c"="hi", "d"="hello")} will print results with named cols with x[ , f(), by = a][] . I don't know how to append the result to the dt. – Infatuate 5/8, 2018 at 19:50

@Jfly That would be a good new question which would likely lead to a feature request filed on GitHub. Perhaps something like x[, {ans=f(); names(ans):=ans}, by=a] could be implemented. That syntax conveys the intent quite nicely to my eye. What you think? – Syncretism 6/8, 2018 at 17:30

Why do you use mynames with and without parenthesis? Is there a difference? – Simeon 25/10, 2018 at 10:28

@HermanToothrot mynames:= assigns to the column called "mynames". (mynames):= assigns to the column name held in the mynames variable in calling scope. – Syncretism 2/11, 2018 at 20:18

@FengJiang data.table #1543 is tracking this issue and links to multiple questions here. – Conley 12/2 at 22:20

B

82

The following shorthand notation might be useful. All credit goes to Andrew Brooks, specifically this article.

dt[,`:=`(avg=mean(mpg), med=median(mpg), min=min(mpg)), by=cyl]

Burglarious answered 1/4, 2018 at 11:37 Comment(1)

This is so much better and more readable than the c() := list() – Cloudberry 8/7, 2022 at 15:50

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.