Why does data.table get copied when returned from Map
Asked Answered
R

1

8

I understood that data.table is not copied when returned from a function. However, in this particular case it does get copied. Can one explain why?

dt1 <- data.table(a=1)
dt2 <- data.table(b=1)
dt3 <- data.table(c=1)

address(dt1); address(dt2); address(dt3)
[1] "000000005E20D990"
[1] "00000000052301E8"
[1] "000000001D622210"

l <- list(a=dt1, b=dt2, c=dt3)
address(l$a); address(l$b); address(l$c)
$[1] "000000005E20D990"
$[1] "00000000052301E8"
$[1] "000000001D622210"

f <- function(dt) {setnames(dt, toupper(names(dt)))}
l <- Map(f, l)
address(l$a); address(l$b); address(l$c)
$[1] "000000001945C7B0"
$[1] "0000000066858738"
$[1] "000000001B021038"

dt1
$   A
$1: 1
dt2
$   B
$1: 1
dt3
$   C
$1: 1

So it is the last line which is making the copy. However, the following does not make a copy.

address(dt1)
$[1] "000000005E20D990"
dt4 <- f(dt1)
address(dt4)
$[1] "000000005E20D990"

What am I missing?

Update As everybody has pointed out, map or mapply is making a copy. lapply works in the above case but my actual code needs multiple inputs in the function. My understanding was that all apply functions use same code. But it does not seems to be the case.

Renatorenaud answered 21/1, 2016 at 12:44 Comment(8)
Map is a wrapper for mapply and I believe the copy happens in mapply.Exhibition
I guess @Exhibition is right. l<-lapply(l,f) doesn't copy. I should add that the use of Map is pretty unusual, since there is just one argument and so lapply should be preferred.Khachaturian
I noted in the source C code of lapply there is the line if (MAYBE_REFERENCED(tmp)) tmp = lazy_duplicate(tmp); while in mapply the line is if (MAYBE_REFERENCED(tmp)) tmp = duplicate(tmp);. Could that be the cause? I'm not expert of R internals, so can't tell for sure.Khachaturian
You can easily avoid using Map or mapply if you have objects available in the parent frame. Then use lapply(seq_along(l), function(i) ...) and subset objects used in mapply using i iterator, so l[[i]] in your example, potentially more as mapply loops over multiple objects.Molybdenite
If I switch l <- Map(f, l) to simply Map(f, l), it seems to work fine. You rarely need to use the return value of set* functions.Glume
You should reword the question since funcdt<-f(dt1); address(funcdt) shows same address. In other words, the problem isn't the function, it's the MapNichols
Thanks @Frank. Map(f,l) works. But it still makes a copy of the data just not assign it to l`.Renatorenaud
@Renatorenaud please post an answer to your question so it can be considered resolved.Molybdenite
M
0

As everybody has pointed out, Map or mapply is making a copy.

Molybdenite answered 21/11, 2020 at 14:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.