Transposing a data.table with reshape2:::dcast
Asked Answered
K

1

6

I am transposing a data.table and have chosen to use reshape2:::dcast, however I am plagued by a strange handling of the data.table... here is a toy data set that replicates the behavior:

> library(data.table)
> library(reshape2)
> DT <- structure(list(STORE = c(32123L, 32469L, 33177L, 33484L, 34772L, 34875L), 
                       VOLUME = c(343.87205, 599.78335, 1665.90895, 712.0343, 465.6489, 622.5189)), 
                       .Names = c("STORE", "VOLUME"), 
                       sorted = "STORE", 
                       class = c("data.table", "data.frame"), 
                       row.names = c(NA, -6L))
> (R1 <- dcast(DT, . ~ STORE, value.var = "VOLUME"))
  .    32123    32469    33177    33484    34772    34875
1 . 343.8721 599.7834 1665.909 712.0343 465.6489 622.5189

Perfect, exactly what I want it to look like (dropping the . column is no biggy) -- but it is no longer classed as a data.table:

> class(R1)
[1] "data.frame"

I attempted to explicitly call the method for data.tables in the data.table package, but R now does not like the formula:

> data.table:::dcast.data.table(DT, . ~ STORE, value.var = "VOLUME")
Error in data.table:::dcast.data.table(DT, . ~ STORE, value.var = "VOLUME") : 
  LHS of formula evaluates to 'character(0)', invalid formula.
2: stop("LHS of formula evaluates to 'character(0)', invalid formula.")
1: data.table:::dcast.data.table(DT, . ~ STORE, value.var = "VOLUME")

This can easily be worked around by creating a dummy pivot column to be dropped after the transposition:

> DT[, "dummy" := NA]
> (R2 <- data.table:::dcast.data.table(DT, dummy ~ STORE, value.var = "VOLUME"))
   dummy    32123    32469    33177    33484    34772    34875
1:    NA 343.8721 599.7834 1665.909 712.0343 465.6489 622.5189
> class(R2)
[1] "data.table" "data.frame"

However, the fact that data.table:::dcast.data.table did not have handling for a NULL LHS of the cast formula leads me to believe I am going about this all wrong -- is there a "data.table" way to do this?

Kelleykelli answered 18/5, 2015 at 14:25 Comment(3)
This is strange. @Arun might want to have a look. As a quick workaround, you can call setDT(dcast(DT, . ~ ...))Negatron
Another workaround is to drop the first colum without creating a dummy pivot column: dcast.data.table(DT, .~STORE, value.var = "VOLUME")[,-1,with=FALSE]Kowtko
This works with the devel version on GH. Or you will need to wait for v 1.9.6 release on cran.Hooch
N
7

In data.table, you do not need dcast to accomplish the reshaping you are after. Simply use

 DT[, as.list(setNames(VOLUME, STORE))]

Explanation:

In [.data.table, if the j expression (ie, DT[i, j]) is a list, the list is regarded as columns of the output.
The names of the list are taken as the column names (missing names filled with V*)

eg:

  DT[, j= list(ABC=123, greetings="Hello World", 1, alpha = 0.9995)]
  #    ABC   greetings V3  alpha
  # 1: 123 Hello World  1 0.9995

Thus in the solution:

  • setnNames (the base R function. Not to be confused with setnames from the data.table package) applies one vector as the names of the other.
  • converting the vector to a list tells [.data.table to treat them as columns.

(alternatively, you could just keep it as a named vector, if that's what you need for your purposes)


dcast issue

As for the difference between reshape2::dcast and dcast.data.table, as @David Arenburg points out in the comments above, this has been addressed in the devel version of data.table on github and will be on CRAN with V1.9.6

library (devtools)
install_github("Rdatatable/data.table")
Negatron answered 18/5, 2015 at 14:49 Comment(1)
Great, exactly what I was looking for, thank you. Would you mind elaborating on why this works?Kelleykelli

© 2022 - 2024 — McMap. All rights reserved.