The simplest way to convert a list with various length vectors to a data.frame in R
Asked Answered
L

3

8

Here I have a list with different length vectors. And I'd want to get a data.frame. I've seen lots of posts about it in SO (see ref), but none of them are as simple as I expected because this is really a common task in data preprocessing. Thank you.

Here simplest means as.data.frame(aa) if it works. So one function from the base package of R will be great. sapply(aa, "length<-", max(lengths(aa))) has four functions actually.

An example is shown below.

Input:

aa <- list(A=c(1, 3, 4), B=c(3,5,7,7,8))

Output:

A B
1 3
3 5
4 7
NA 7
NA 8

A and B are the colnames of the data.frame.

One answer is sapply(aa, '[', seq(max(sapply(aa, length)))), but it's also complex.

ref:

  1. How to convert a list consisting of vector of different lengths to a usable data frame in R?

  2. Combining (cbind) vectors of different length

Lustig answered 9/11, 2015 at 16:10 Comment(10)
You can make it compact with data.frame(lapply(aa, "length<-", max(lengths(aa)))) It is also faster when compared to sapply(aa, length)Adjacent
data-science???Backscratcher
@akrun, it's a solution, but not as simple as possible in R.Lustig
@David Arenburg, It's related with data science as data preprocess is always an important part for data science due to the unformatted data.Lustig
You can use library(stringi); stri_list2matrix(aa) but the character elements needs to be converted to numeric though. I am not sure whether simple means compact code for you though.Adjacent
@akrun, I think stri_list2matrix is a simple answer, though I think there should be a function in the base package in R. In my opinion, simple means easy to use and to be understood.Lustig
Well, you can create a function with these tools so that it becomes simple for you.Adjacent
@akrun, sapply(aa, "length<-", max(lengths(aa))) works as well. Here it seems length<- means length(x) <- max(lengths(aa))?Lustig
Yes, and it is very fast based on some benchmarks done earlier.Adjacent
@ZhilongJia, I found the comment of @fdetsch here interesting. Maybe something like do.call(qpcR:::cbind.na, aa) could be interesting, but is not fully base R though.Oology
A
19

We can use

data.frame(lapply(aa, "length<-", max(lengths(aa))))

Or using tidyverse

library(dplyr)
library(tibble)
library(tidyr)
enframe(aa) %>%
    unnest(value)
Adjacent answered 10/11, 2015 at 4:25 Comment(2)
We don't know what the OP regards as "simple", but setDT in place of data.frame saves some characters and operations.Fluky
@Fluky I agree. It seems to me that the OP wants base R options.Adjacent
M
2

Using tidyverse packages. Place the list in a nested data frame. Extract the name for each vector in the list. Unnest the data frame. Give a row index i for each element in each vector, spread the data in wide format

    aa <- list(A = c(1, 3, 4), B = c(3, 5, 7, 7, 8))
    library(tidyverse)
    data_frame(data = aa) %>% 
        group_by(name = names(data)) %>% 
        unnest() %>%
        mutate(i = row_number()) %>% 
        spread(name, data)
    # A tibble: 5 x 3
          i     A     B
    * <int> <dbl> <dbl>
    1     1     1     3
    2     2     3     5
    3     3     4     7
    4     4    NA     7
    5     5    NA     8
Melesa answered 19/9, 2018 at 14:2 Comment(0)
C
1

Make this function:

listToDF <- function(aa){
  sapply(aa, "length<-", max(lengths(aa)))
 }

Then use it, simply:

listToDF(aa)
Corselet answered 10/11, 2015 at 11:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.