How to flatten a list to a list without coercion?

Asked 15/11, 2011 at 16:29 Answered 6/7, 2020 at 10:13

Solved list r standard-library flatten type-coercion

I am trying to achieve the functionality similar to unlist, with the exception that types are not coerced to a vector, but the list with preserved types is returned instead. For instance:

flatten(list(NA, list("TRUE", list(FALSE), 0L))

should return

list(NA, "TRUE", FALSE, 0L)

instead of

c(NA, "TRUE", "FALSE", "0")

which would be returned by unlist(list(list(NA, list("TRUE", list(FALSE), 0L)).

As it is seen from the example above, the flattening should be recursive. Is there a function in standard R library which achieves this, or at least some other function which can be used to easily and efficiently implement this?

UPDATE: I don't know if it is clear from the above, but non-lists should not be flattened, i.e. flatten(list(1:3, list(4, 5))) should return list(c(1, 2, 3), 4, 5).

Samal answered 15/11, 2011 at 16:29 Comment(3)

What should flatten( list(1:3, list(1:3, 'foo')) ) return? – Gretagretal 15/11, 2011 at 20:18

list(c(1, 2, 3), c(1, 2, 3), 'foo'). Explanation: 1:3 is not a list, so it should not be flatten. – Samal 16/11, 2011 at 19:25

purrr::flatten looks like current best practice (as per @Aurèle's answer) – Fungicide 13/9, 2017 at 8:13

Interesting non-trivial problem!

MAJOR UPDATE With all that's happened, I've rewrote the answer and removed some dead ends. I also timed the various solutions on different cases.

Here's the first, rather simple but slow, solution:

flatten1 <- function(x) {
  y <- list()
  rapply(x, function(x) y <<- c(y,x))
  y
}

rapply lets you traverse a list and apply a function on each leaf element. Unfortunately, it works exactly as unlist with the returned values. So I ignore the result from rapply and instead I append values to the variable y by doing <<-.

Growing y in this manner is not very efficient (it's quadratic in time). So if there are many thousands of elements this will be very slow.

A more efficient approach is the following, with simplifications from @JoshuaUlrich:

flatten2 <- function(x) {
  len <- sum(rapply(x, function(x) 1L))
  y <- vector('list', len)
  i <- 0L
  rapply(x, function(x) { i <<- i+1L; y[[i]] <<- x })
  y
}

Here I first find out the result length and pre-allocate the vector. Then I fill in the values. As you can will see, this solution is much faster.

Here's a version of @JoshO'Brien great solution based on Reduce, but extended so it handles arbitrary depth:

flatten3 <- function(x) {
  repeat {
    if(!any(vapply(x, is.list, logical(1)))) return(x)
    x <- Reduce(c, x)
  }
}

Now let the battle begin!

# Check correctness on original problem 
x <- list(NA, list("TRUE", list(FALSE), 0L))
dput( flatten1(x) )
#list(NA, "TRUE", FALSE, 0L)
dput( flatten2(x) )
#list(NA, "TRUE", FALSE, 0L)
dput( flatten3(x) )
#list(NA_character_, "TRUE", FALSE, 0L)

# Time on a huge flat list
x <- as.list(1:1e5)
#system.time( flatten1(x) )  # Long time
system.time( flatten2(x) )  # 0.39 secs
system.time( flatten3(x) )  # 0.04 secs

# Time on a huge deep list
x <-'leaf'; for(i in 1:11) { x <- list(left=x, right=x, value=i) }
#system.time( flatten1(x) ) # Long time
system.time( flatten2(x) )  # 0.05 secs
system.time( flatten3(x) )  # 1.28 secs

...So what we observe is that the Reduce solution is faster when the depth is low, and the rapply solution is faster when the depth is large!

As correctness goes, here are some tests:

> dput(flatten1( list(1:3, list(1:3, 'foo')) ))
list(1L, 2L, 3L, 1L, 2L, 3L, "foo")
> dput(flatten2( list(1:3, list(1:3, 'foo')) ))
list(1:3, 1:3, "foo")
> dput(flatten3( list(1:3, list(1:3, 'foo')) ))
list(1L, 2L, 3L, 1:3, "foo")

Unclear what result is desired, but I lean towards the result from flatten2...

Gretagretal answered 15/11, 2011 at 16:49 Comment(8)

I came up with something similar to your update, but perhaps less complicated: y <- vector("list", sum(rapply(x,length))); i <- 1 then rapply(x, function(z) {y[[i]] <<- z; i <<- i+1}). It's about as fast as your updated solution. – Rajput 15/11, 2011 at 17:22

Silly me, yes, that's much easier - I didn't think y[[i]] <<- z would work so I didn't even try it! – Gretagretal 15/11, 2011 at 17:30

@Gretagretal -- I just stole your most recent version of flatten, adding a line that takes care of the corner case you identified. Hope you don't mind, and feel free to edit your own version accordingly. Thanks! – Zulmazulu 15/11, 2011 at 18:24

+1 -- Don't know how I didn't already upvote this post. This should put you up top so that your excellent comparisons get max visibility. Plus, I definitely prefer the output of flatten2. – Zulmazulu 15/11, 2011 at 22:43

Thanks. You can eliminate flatten1. Not only it is the slowest one, but it also doesn't preserve non-lists (i.e. 1:5 flattens while it should not). – Samal 16/11, 2011 at 19:34

@Tommy: can you come up with a solution that preserves names in a named list? – Gorged 25/1, 2013 at 1:46

One problem with most of these methods is that NULL elements silently get dropped. – Mercenary 5/11, 2013 at 3:10

@Tommy, sorry for the previous missing of attribution in my package function rlist::list.flatten for using your implementation of flatten2. You deserve full credit of it at github.com/renkun-ken/rlist/blob/master/R/list.flatten.R and thanks for your idea! – Symbiosis 31/7, 2015 at 4:34

For lists that are only a few nestings deep, you could use Reduce() and c() to do something like the following. Each application of c() removes one level of nesting. (For fully general solution, see EDITs below.)

L <- (list(NA, list("TRUE", list(FALSE), 0L)))
Reduce(c, Reduce(c, L))
[[1]]
[1] NA

[[2]]
[1] "TRUE"

[[3]]
[1] FALSE

[[4]]
[1] 0



# TIMING TEST
x <- as.list(1:4e3)
system.time(flatten(x))   # Using the improved version    
# user  system elapsed 
# 0.14    0.00    0.13 
system.time(Reduce(c, x))
# user  system elapsed 
# 0.04    0.00    0.03

EDIT Just for fun, here's a version of @Tommy's version of @JoshO'Brien's solution that does work for already flat lists. FURTHER EDIT Now @Tommy's solved that problem as well, but in a cleaner way. I'll leave this version in place.

flatten <- function(x) {
    x <- list(x)
    repeat {
        x <- Reduce(c, x)
        if(!any(vapply(x, is.list, logical(1)))) return(x)
    }
}

flatten(list(3, TRUE, 'foo'))
# [[1]]
# [1] 3
# 
# [[2]]
# [1] TRUE
# 
# [[3]]
# [1] "foo"

Familial answered 15/11, 2011 at 17:26 Comment(9)

+1 for nice use of Reduce! ...But it doesn't seem to handle flatten(list(3, TRUE, 'foo')) – Gretagretal 15/11, 2011 at 17:51

I am more concerned about implementing it recursively, in order to wor for non constant depth lists. Is there a function which can be used to detect if a list is flattened? – Samal 15/11, 2011 at 17:54

@leden -- You can test whether a list is flat with !any(sapply(L, class)=="list"), which will evaluate to TRUE for fully flattened lists. – Zulmazulu 15/11, 2011 at 18:0

@leden - I added a variant that does that. – Gretagretal 15/11, 2011 at 18:5

@Gretagretal -- Why'd you have to go ahead and ruin a perfectly elegant solution ;). If the function might be passed already flat lists, you'd need to either: (a) check for that case in advance; or (b) pre-emptively wrap every list passed to it, like this: Reduce(c, list(x)), where in your example, x <- list(3, TRUE, 'foo'). – Zulmazulu 15/11, 2011 at 18:9

@JoshO'Brien - well, I got you to post a better solution, didn't I? ;-). A suggestion: just move the if clause before Reduce and skip the x<-list(x) – Gretagretal 15/11, 2011 at 18:25

@JoshO'Brien wouldn't !any(vapply(L, is.list, logical(1))) be better? – Continual 16/11, 2011 at 3:23

@Continual -- You're referring to the comment four comments up from yours, where I suggested !any(sapply... right? (In the body of the post, I did use the better idiom that you (and Tommy) suggested). – Zulmazulu 16/11, 2011 at 3:30

@ttmaccer -- Great catch. I'd wager good money that the Reduce() formulation will never be faster than your do.call() construct, and it'll sometimes be MUCH slower. Thanks. – Zulmazulu 15/8, 2012 at 17:12

How about this? It builds off Josh O'Brien's solution but does the recursion with a while loop instead using unlist with recursive=FALSE.

flatten4 <- function(x) {
  while(any(vapply(x, is.list, logical(1)))) { 
    # this next line gives behavior like Tommy's answer; 
    # removing it gives behavior like Josh's
    x <- lapply(x, function(x) if(is.list(x)) x else list(x))
    x <- unlist(x, recursive=FALSE) 
  }
  x
}

Keeping the commented line in gives results like this (which Tommy prefers, and so do I, for that matter).

> x <- list(1:3, list(1:3, 'foo'))
> dput(flatten4(x))
list(1:3, 1:3, "foo")

Output from my system, using Tommy's tests:

dput(flatten4(foo))
#list(NA, "TRUE", FALSE, 0L)

# Time on a long 
x <- as.list(1:1e5)
system.time( x2 <- flatten2(x) )  # 0.48 secs
system.time( x3 <- flatten3(x) )  # 0.07 secs
system.time( x4 <- flatten4(x) )  # 0.07 secs
identical(x2, x4) # TRUE
identical(x3, x4) # TRUE

# Time on a huge deep list
x <-'leaf'; for(i in 1:11) { x <- list(left=x, right=x, value=i) }
system.time( x2 <- flatten2(x) )  # 0.05 secs
system.time( x3 <- flatten3(x) )  # 1.45 secs
system.time( x4 <- flatten4(x) )  # 0.03 secs
identical(x2, unname(x4)) # TRUE
identical(unname(x3), unname(x4)) # TRUE

EDIT: As for getting the depth of a list, maybe something like this would work; it gets the index for each element recursively.

depth <- function(x) {
  foo <- function(x, i=NULL) {
    if(is.list(x)) { lapply(seq_along(x), function(xi) foo(x[[xi]], c(i,xi))) }
    else { i }
  }
  flatten4(foo(x))
}

It's not super fast but it seems to work fine.

x <- as.list(1:1e5)
system.time(d <- depth(x)) # 0.327 s

x <-'leaf'; for(i in 1:11) { x <- list(left=x, right=x, value=i) }
system.time(d <- depth(x)) # 0.041s

I'd imagined it being used this way:

> x[[ d[[5]] ]]
[1] "leaf"
> x[[ d[[6]] ]]
[1] 1

But you could also get a count of how many nodes are at each depth too.

> table(sapply(d, length))

   1    2    3    4    5    6    7    8    9   10   11 
   1    2    4    8   16   32   64  128  256  512 3072

Andrewandrewes answered 15/11, 2011 at 20:49 Comment(7)

+1 for continuing to extend this. Now if only we had some way to quickly assess the depth of lists... Any ideas? – Zulmazulu 15/11, 2011 at 23:55

@JoshO'Brien: See edit for depth idea. It works but it's not great. Any suggestions? – Andrewandrewes 16/11, 2011 at 15:42

Hi Aaron. Nice solution, but I agree it's not ideal. It would be nice to find something that always ran faster than the worst case flatten4 timings. My two thoughts are: "I wonder if the phylogenetics folks already have something like this in a package", and "Folks who work with parsers could do this in a snap". – Zulmazulu 16/11, 2011 at 17:1

I played for a few minutes with the string resulting from deparse(L), i.e. "list(NA, list(\"TRUE\", list(FALSE), 0L))", but realized I'm in over my head/don't have the time. My basic idea was to run through it once, counting every occurrence of the substring list( as a +1, and every matching right paren ) as a -1. max(cumsum()) or some equivalent would get you the maximum depth. Seems like a sound approach with a perhaps monstrous regexp needed for the implementation! This might be a good SO question for one of us to ask at some point... – Zulmazulu 16/11, 2011 at 17:6

Thanks. I think this is the best solution so far. – Samal 16/11, 2011 at 19:37

+1. This solves another problem: the other functions unlist vectors as well, this one only lists. (Tommy's solution does not do this when vectors and lists are in depth=2) – Shackleford 12/4, 2014 at 13:58

if you change both instances of is.list in flatten4 to function(x) !is.data.frame(x) & is.list(x), then this will work for data frames as well which none of the answers currently do flatten4(list(1:3, list(1:3, 'foo'), TRUE, 'hi', list(head(mtcars), list(tail(mtcars))))) works like a dream – Contemporaneous 10/11, 2015 at 1:54

Edited to address a flaw pointed out in the comments. Sadly, it just makes it even less efficient. Ah well.

Another approach, although I'm not sure it will be more efficient than anything @Tommy has suggested:

l <- list(NA, list("TRUE", list(FALSE), 0L))

flatten <- function(x){
    obj <- rapply(x,identity,how = "unlist")
    cl <- rapply(x,class,how = "unlist")
    len <- rapply(x,length,how = "unlist")
    cl <- rep(cl,times = len)
    mapply(function(obj,cl){rs <- as(obj,cl); rs}, obj, cl, 
        SIMPLIFY = FALSE, USE.NAMES = FALSE)
}

> flatten(l)
[[1]]
[1] NA

[[2]]
[1] "TRUE"

[[3]]
[1] FALSE

[[4]]
[1] 0

Latimore answered 15/11, 2011 at 17:22 Comment(3)

Yeah, it's a bit (~3x) slower, but +1 for interesting solution! – Gretagretal 15/11, 2011 at 17:42

Hmm. I fails for flatten( list(1:3, list(1:3, 'foo')) ) – Gretagretal 15/11, 2011 at 20:24

@Gretagretal Good catch. I edited to address the problem, although it will make the performance even worse that before, sadly. – Latimore 15/11, 2011 at 22:51

purrr::flatten achieves that. Though it is not recursive (by design).

So applying it twice should work:

library(purrr)
l <- list(NA, list("TRUE", list(FALSE), 0L))
flatten(flatten(l))

Here is an attempt at a recursive version:

flatten_recursive <- function(x) {
  stopifnot(is.list(x))
  if (any(vapply(x, is.list, logical(1)))) Recall(purrr::flatten(x)) else x
}
flatten_recursive(l)

Psychotomimetic answered 5/10, 2016 at 8:55 Comment(0)

hack_list <- function(.list) {
  .list[['_hack']] <- function() NULL
  .list <- unlist(.list)
  .list$`_hack` <- NULL
  .list
}

Leishaleishmania answered 30/1, 2014 at 18:52 Comment(0)

You can also use rrapply in the rrapply-package (extended version of base-rapply) by setting how = "flatten":

library(rrapply)

rrapply(list(NA, list("TRUE", list(FALSE), 0L)), how = "flatten")
#> [[1]]
#> [1] NA
#> 
#> [[2]]
#> [1] "TRUE"
#> 
#> [[3]]
#> [1] FALSE
#> 
#> [[4]]
#> [1] 0

Computation times

Below are some benchmark timings against the flatten2 and flatten3 functions in Tommy's response for two large nested lists:

flatten2 <- function(x) {
  len <- sum(rapply(x, function(x) 1L))
  y <- vector('list', len)
  i <- 0L
  rapply(x, function(x) { i <<- i+1L; y[[i]] <<- x })
  y
}

flatten3 <- function(x) {
  repeat {
    if(!any(vapply(x, is.list, logical(1)))) return(x)
    x <- Reduce(c, x)
  }
}

## large deeply nested list (1E6 elements, 6 layers)
deep_list <- rrapply(replicate(10, 1, simplify = F), classes = c("list", "numeric"), condition = function(x, .xpos) length(.xpos) < 6, f = function(x) replicate(10, 1, simplify = F), how = "recurse")

system.time(flatten2(deep_list))
#>    user  system elapsed 
#>   1.715   0.012   1.727
## system.time(flatten3(deep_list)), not run takes more than 10 minutes
system.time(rrapply(deep_list, how = "flatten"))
#>    user  system elapsed 
#>   0.105   0.016   0.121

## large shallow nested list (1E6 elements, 2 layers)
shallow_list <- lapply(replicate(1000, 1, simplify = F), function(x) replicate(1000, 1, simplify = F))

system.time(flatten2(shallow_list))
#>    user  system elapsed 
#>   1.308   0.040   1.348
system.time(flatten3(shallow_list))
#>    user  system elapsed 
#>   5.246   0.012   5.259
system.time(rrapply(shallow_list, how = "flatten"))
#>    user  system elapsed 
#>    0.09    0.00    0.09

Osteotome answered 6/7, 2020 at 10:13 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags