tl;dr
How can I transpose a data frame with column vectors of different classes (one column is character
, the next is numeric
, yet another is logical
etc.) in a way that preserves the class/data type information?
Example data:
mydata <- data.frame(
col0 = c("row1", "row2", "row3"),
col1 = c(1, 2, 3),
col2 = letters[1:3],
col3 = c(TRUE, FALSE, TRUE)
)
Here's also a little xlsx
file that features examples for both data orientations: https://github.com/rappster/stackoverflow/blob/master/excel/row-and-column-based-data.xlsx
Problem
A simple t()
or a bit more involved routines like the one suggested in this post are great, but do not preserve the class information or the original data frame's columns. I'm also aware that class data.frame
was never meant to store mixed-class information in its columns.
However, at least I would like come as close as possible to simply "inverting" the way data.frame
s were meant to be: centering the perspective on rows instead of columns perspective. I.e., all elements in row vectors need to belong to the same class while classes can differ across column vectors.
Context
I often work in projects where people are used to representing time series data in a horizontal ("variables-are-in-rows") instead of a vertical ("variables-are-in-columns") orientation that we are all used to in R (and which, IMHO, also makes much more sense).
What's more, they are using MS Excel extensively. And I need to both read data in "wide-format" and I to update existing Excel files by writing formulas directly from R to the file via XLConnect
and/or openxlsx
(as opposed to being able to do my calculations in R and then simply dumping the final result in an Excel file).
While I constantly try to tell them that using such an orientation means working against established standards across languages/tools (at least true for R and MS Excel), it is not very likely that they are going to switch. So I have to deal with that somehow.
Current approach
So I though about keeping an underlying list
but make it "look and feel" like a data.frame
as much as possible. It works, but is pretty involved. I thought there might be some more clever solution to this.
Function def:
transpose <- function (
x,
col = character(),
rnames_or_col = c("col", "rnames")
) {
rnames_or_col <- match.arg(rnames_or_col, c("col", "rnames"))
## Buffering column names //
cnames <- if (length(col)) {
x[[col]]
} else {
make.names(1:nrow(x))
}
## Removing anchoring column //
if (inherits(x, "data.table")) {
x <- as.data.frame(x, stringsAsFactors = FALSE)
}
## I don't like this part. Any suggestions on how a) build on top of existing
## data.table functionality b) the easiest way to make a data.table behave
## like a data.frame when indexing (remove operation below will yield
## "undesired" results from a data.frame perspective; it's fine in from
## data.table's perspective/paradigm of course)
if (length(col)) {
x <- x[ , -which(names(x) == col)]
}
## Buffer classes //
classes <- lapply(x, class)
## Buffer row names //
rnames <- names(x)
## Listify //
x <- lapply(as.list(x), function(row) {
df <- do.call(data.frame, list(as.list(row), stringsAsFactors = FALSE))
names(df) <- cnames
df
})
names(x) <- rnames
## Actual row names or row names as first column //
if (rnames_or_col == "col") {
x <- lapply(x, function(ii) {
data.frame(variable = row.names(ii), ii,
stringsAsFactors = FALSE, row.names = NULL, check.names = FALSE)
})
}
## Class //
class(x) <- c("df_transposed", class(x))
x
}
Print method:
print.df_transposed <- function(object) {
cat("df_transposed: \n")
out <- do.call(rbind, object)
rownames(out) <- NULL
print(out)
}
Getter and setter methods:
"[<-.df_transposed" <- function(x, i, j, value) {
x[[i]][ , j] <- value
x
}
"[.df_transposed" <- function(x, i, j, drop = FALSE) {
# foo <- function(x, i, j, drop = FALSE) {
has_i <- !missing(i)
has_j <- !missing(j)
cls <- class(x)
scope <- if (has_i) {
i
} else {
1:length(x)
}
out <- lapply(unclass(x)[scope], function(ii) {
nms <- names(ii)
if (has_j) {
tmp <- ii[ , j, drop = drop]
names(tmp) <- nms[j]
## --> necessary due to `check.names` missing for `[.data.frame` :-/
tmp
} else {
ii
}
})
class(out) <- cls
out
}
Class function:
class2 <- function(x) {
sapply(x, function(ii) {
value <- if ("variable" %in% names(ii)) {
unlist(ii[, -1])
} else {
unlist(ii)
}
class(value)
})
}
Application
Example data:
mydata <- data.frame(
col0 = c("row1", "row2", "row3"),
col1 = c(1, 2, 3),
col2 = letters[1:3],
col3 = c(TRUE, FALSE, TRUE)
)
Actual transposing and print methods:
> (df_t <- transpose(mydata, col = "col0"))
df_transposed:
variable row1 row2 row3
1 col1 1 2 3
2 col2 a b c
3 col3 TRUE FALSE TRUE
> (df_t2 <- transpose(mydata, col = "col0", rnames_or_col = "rnames"))
df_transposed:
row1 row2 row3
col1 1 2 3
col2 a b c
col3 TRUE FALSE TRUE
Printing unclassed objects:
> unclass(df_t)
$col1
variable row1 row2 row3
1 col1 1 2 3
$col2
variable row1 row2 row3
1 col2 a b c
$col3
variable row1 row2 row3
1 col3 TRUE FALSE TRUE
> unclass(df_t2)
$col1
row1 row2 row3
col1 1 2 3
$col2
row1 row2 row3
col2 a b c
$col3
row1 row2 row3
col3 TRUE FALSE TRUE
Class query:
> class2(df_t)
col1 col2 col3
"numeric" "character" "logical"
Indexing:
> dat_t[1, ]
df_transposed:
variable row1 row2 row3
1 1 1 2 3
> dat_t[, 1]
df_transposed:
variable
1 1
2 1
3 1
>
> dat_t[1, 2]
df_transposed:
row1
col1 1
> dat_t[2, 3]
df_transposed:
row2
col2 b
>
> dat_t[1:2, ]
df_transposed:
variable row1 row2 row3
1 1 1 2 3
2 1 a b c
> dat_t[, 1:3]
df_transposed:
variable row1 row2
1 1 1 2
2 1 a b
3 1 TRUE FALSE
>
> dat_t[c(1, 3), 2:4]
df_transposed:
row1 row2 row3
col1 1 2 3
col3 1 0 1
?data.frame
it says: "A data frame, a matrix-like structure whose columns may be of differing types (numeric, logical, factor and character and so on).", not mentioning mixing of types within columns. Many R operations depend on each column having predictable uniform type. – Floatfeedt()
, so I must have missed to think that part over. I updated the function accordingly. Would you know where to get a more or less comprehensive list of methods that would need to be implemented to so the class would play nicely with the most common functions for manipulatingdata.frame
s? – Expansionismdata.frame
is not meant for representing the type of data I'm trying to use it for. I'm just wondering what the "best" solution would look like to at least come close to realizing a transpose of a data frame with mixed column classes. – Expansionismmethods(class = "data.frame")
andshowMethods(class = "data.frame")
– Distendreadxl::read_excel
) expect the data to be in standard orientation, even my cumbersome workaround won't get me any points - so I need to figure out something else. Maybe usingmelt
andcast
or something like that. – Expansionismcsv
, but not necessarily when writing toxlsx
files. There are two very cool packages that allow you to actually manipulatexlsx
files "headlessly" from R scripts via calling an (Java-based?) API (in the sense that you are actually are operating in the actual file instead of simply writing the final result to the file via, e.g.,xlsx::write.xlsx
):XLConnect
andopenxlsx
– Expansionism