Transposing a data frame while preserving class/data type information
Asked Answered
E

0

6

tl;dr

How can I transpose a data frame with column vectors of different classes (one column is character, the next is numeric, yet another is logical etc.) in a way that preserves the class/data type information?

Example data:

mydata <- data.frame(
  col0 = c("row1", "row2", "row3"),
  col1 = c(1, 2, 3),
  col2 = letters[1:3],
  col3 = c(TRUE, FALSE, TRUE)
)

Here's also a little xlsx file that features examples for both data orientations: https://github.com/rappster/stackoverflow/blob/master/excel/row-and-column-based-data.xlsx

Problem

A simple t() or a bit more involved routines like the one suggested in this post are great, but do not preserve the class information or the original data frame's columns. I'm also aware that class data.frame was never meant to store mixed-class information in its columns.

However, at least I would like come as close as possible to simply "inverting" the way data.frames were meant to be: centering the perspective on rows instead of columns perspective. I.e., all elements in row vectors need to belong to the same class while classes can differ across column vectors.

Context

I often work in projects where people are used to representing time series data in a horizontal ("variables-are-in-rows") instead of a vertical ("variables-are-in-columns") orientation that we are all used to in R (and which, IMHO, also makes much more sense).

What's more, they are using MS Excel extensively. And I need to both read data in "wide-format" and I to update existing Excel files by writing formulas directly from R to the file via XLConnect and/or openxlsx (as opposed to being able to do my calculations in R and then simply dumping the final result in an Excel file).

While I constantly try to tell them that using such an orientation means working against established standards across languages/tools (at least true for R and MS Excel), it is not very likely that they are going to switch. So I have to deal with that somehow.

Current approach

So I though about keeping an underlying list but make it "look and feel" like a data.frame as much as possible. It works, but is pretty involved. I thought there might be some more clever solution to this.

Function def:

transpose <- function (
  x,
  col = character(),
  rnames_or_col = c("col", "rnames")
) {
  rnames_or_col <- match.arg(rnames_or_col, c("col", "rnames"))

  ## Buffering column names //
  cnames <- if (length(col)) {
    x[[col]]
  } else {
    make.names(1:nrow(x))
  }

  ## Removing anchoring column //
  if (inherits(x, "data.table")) {
    x <- as.data.frame(x, stringsAsFactors = FALSE)
  }
  ## I don't like this part. Any suggestions on how a) build on top of existing
  ## data.table functionality b) the easiest way to make a data.table behave
  ## like a data.frame when indexing (remove operation below will yield
  ## "undesired" results from a data.frame perspective; it's fine in from
  ## data.table's perspective/paradigm of course)
  if (length(col)) {
    x <- x[ , -which(names(x) == col)]
  }

  ## Buffer classes //
  classes <- lapply(x, class)

  ## Buffer row names //
  rnames <- names(x)

  ## Listify //
  x <- lapply(as.list(x), function(row) {
    df <- do.call(data.frame, list(as.list(row), stringsAsFactors = FALSE))
    names(df) <- cnames
    df
  })
  names(x) <- rnames

  ## Actual row names or row names as first column //
  if (rnames_or_col == "col") {
    x <- lapply(x, function(ii) {
      data.frame(variable = row.names(ii), ii,
        stringsAsFactors = FALSE, row.names = NULL, check.names = FALSE)
    })
  }

  ## Class //
  class(x) <- c("df_transposed", class(x))

  x
}

Print method:

print.df_transposed <- function(object) {
  cat("df_transposed: \n")
  out <- do.call(rbind, object)
  rownames(out) <- NULL
  print(out)
}

Getter and setter methods:

"[<-.df_transposed" <- function(x, i, j, value) {
  x[[i]][ , j] <- value
  x
}

"[.df_transposed" <- function(x, i, j, drop = FALSE) {
# foo <- function(x, i, j, drop = FALSE) {
  has_i <- !missing(i)
  has_j <- !missing(j)
  cls <- class(x)
  scope <- if (has_i) {
    i
  } else {
    1:length(x)
  }
  out <- lapply(unclass(x)[scope], function(ii) {
    nms <- names(ii)

    if (has_j) {
      tmp <- ii[ , j, drop = drop]
      names(tmp) <- nms[j]
      ## --> necessary due to `check.names` missing for `[.data.frame` :-/
      tmp
    } else {
      ii
    }
  })
  class(out) <- cls
  out
}

Class function:

class2 <- function(x) {
  sapply(x, function(ii) {
    value <- if ("variable" %in% names(ii)) {
      unlist(ii[, -1])
    } else {
      unlist(ii)
    }
    class(value)
  })
}

Application

Example data:

mydata <- data.frame(
  col0 = c("row1", "row2", "row3"),
  col1 = c(1, 2, 3),
  col2 = letters[1:3],
  col3 = c(TRUE, FALSE, TRUE)
)

Actual transposing and print methods:

> (df_t <- transpose(mydata, col = "col0"))
df_transposed: 
  variable row1  row2 row3
1     col1    1     2    3
2     col2    a     b    c
3     col3 TRUE FALSE TRUE
> (df_t2 <- transpose(mydata, col = "col0", rnames_or_col = "rnames"))
df_transposed: 
     row1  row2 row3
col1    1     2    3
col2    a     b    c
col3 TRUE FALSE TRUE

Printing unclassed objects:

> unclass(df_t)
$col1
  variable row1 row2 row3
1     col1    1    2    3

$col2
  variable row1 row2 row3
1     col2    a    b    c

$col3
  variable row1  row2 row3
1     col3 TRUE FALSE TRUE

> unclass(df_t2)
$col1
     row1 row2 row3
col1    1    2    3

$col2
     row1 row2 row3
col2    a    b    c

$col3
     row1  row2 row3
col3 TRUE FALSE TRUE

Class query:

> class2(df_t)
       col1        col2        col3 
  "numeric" "character"   "logical"     

Indexing:

> dat_t[1, ]
df_transposed: 
  variable row1 row2 row3
1        1    1    2    3
> dat_t[, 1]
df_transposed: 
  variable
1        1
2        1
3        1
> 
> dat_t[1, 2]
df_transposed: 
     row1
col1    1
> dat_t[2, 3]
df_transposed: 
     row2
col2    b
> 
> dat_t[1:2, ]
df_transposed: 
  variable row1 row2 row3
1        1    1    2    3
2        1    a    b    c
> dat_t[, 1:3]
df_transposed: 
  variable row1  row2
1        1    1     2
2        1    a     b
3        1 TRUE FALSE
> 
> dat_t[c(1, 3), 2:4]
df_transposed: 
     row1 row2 row3
col1    1    2    3
col3    1    0    1
Expansionism answered 5/2, 2016 at 12:31 Comment(14)
AFAIK, you can't have more than one type per column. In ?data.frame it says: "A data frame, a matrix-like structure whose columns may be of differing types (numeric, logical, factor and character and so on).", not mentioning mixing of types within columns. Many R operations depend on each column having predictable uniform type.Floatfeed
I think you are going to have to create a new class and all its methods as you are doing. Not clear if there is any benefit in using a list rather than a the data frame prior to transposing as the internal representation.Distend
@G.Grothendieck: that step is indeed completely unnecessary, thanks for pointing me to it! The function grew out of other trial-and-errors with t(), so I must have missed to think that part over. I updated the function accordingly. Would you know where to get a more or less comprehensive list of methods that would need to be implemented to so the class would play nicely with the most common functions for manipulating data.frames?Expansionism
@RomanLuštrik: thanks for taking the time. I'm aware that data.frame is not meant for representing the type of data I'm trying to use it for. I'm just wondering what the "best" solution would look like to at least come close to realizing a transpose of a data frame with mixed column classes.Expansionism
@Rappster, To find the data.frame methods try methods(class = "data.frame") and showMethods(class = "data.frame")Distend
@G.Grothendieck: great, thanks!Expansionism
@Expansionism as a lot of those methods expect the whole column to be of the same type (or coerce it), I really don't see how you'll end up with something working at end, this means rewriting all this methods to work by row instead. That sounds counterproductive at best, if the problem is the 'screen' representation, there's probably another solution to your problem while keeping the datas in a columns based format. I'm curious of a use case for this in fact.Northrup
@Tensibai: I agree, and I don't like that orientation for data storage either. But I have a use case where I need to write actual Excel formulas and read Excel data in a transposed orientation as the clients wants it that way and it's very hard to convince him that this means working against the standard :-/ So I need to figure out a systematic way of dealing with such transposed data orientation and transforming from "long" to "wide" and vice versa.Expansionism
@Tensibai: but as all functions that read tabular data into R (e.g. readxl::read_excel) expect the data to be in standard orientation, even my cumbersome workaround won't get me any points - so I need to figure out something else. Maybe using melt and cast or something like that.Expansionism
@Expansionism I'm not working with excel, so maybe next comment will be absolutely wrong :) Passing to excel, you already coerce to char at a point, so getting the usual format to work in R and casting to wide before exporting to excel should not be a real problem, is it ? (sound an interesting problem to tackle)Northrup
@Tensibai: AFAIK, this would be the case when writing to csv, but not necessarily when writing to xlsx files. There are two very cool packages that allow you to actually manipulate xlsx files "headlessly" from R scripts via calling an (Java-based?) API (in the sense that you are actually are operating in the actual file instead of simply writing the final result to the file via, e.g., xlsx::write.xlsx): XLConnect and openxlsxExpansionism
Could you post a sample xlsx document?Julianejuliann
@BrandonBertelsen: sure, here you go: github.com/rappster/stackoverflow/blob/master/excel/…Expansionism
@BrandonBertelsen: sorry, I had it wrong at the first go, now the data in the example file is correct: the first sheet demonstrates the "typical" vertical orientation while the second sheet features the horizontal orientation that I often have to deal withExpansionism

© 2022 - 2024 — McMap. All rights reserved.