Transform a dataframe into a tree structure list of lists
Asked Answered
P

1

11

I have a data.frame with two columns representing a hierarchical tree, with parents and nodes.

I want to transform its structure in a way that I can use as an input for the function d3tree, from d3Network package.

Here's my data frame:

df <- data.frame(c("Canada","Canada","Quebec","Quebec","Ontario","Ontario"),c("Quebec","Ontario","Montreal","Quebec City","Toronto","Ottawa"))
names(df) <- c("parent","child")

And I want to transform it to this structure

Canada_tree <- list(name = "Canada", children = list(
                                                list(name = "Quebec", 
                children = list(list(name = "Montreal"),list(name = "Quebec City"))),
                                                 list(name = "Ontario", 
                children = list(list(name = "Toronto"),list(name = "Ottawa")))))

I have succesfully transformed this particular case using this code below:

fill_list <- function(df,node) node <- as.character(node)if (is.leaf(df,node)==TRUE){
    return (list(name = node))
  }
  else {
    new_node = df[df[,1] == node,2]

    return (list(name = node, children =  list(fill_list(df,new_node[1]),fill_list(df,new_node[2]))))
  }

The problem is, it only works with trees which every parent node has exactly two children. You can see I hard coded the two children (new_node[1] and new_node[2]) as inputs for my recursive function.

I'm trying to figure out a way that I could call the recursive function as many time as the parent's node children. Example:

fill_list(df,new_node[1]),...,fill_list(df,new_node[length(new_node)])

I tried these 3 possibilities but none of it worked:

First: Creating a string with all the functions and parameters and then evaluating. It return this error could not find function fill_functional(df,new_node[1]). That's because my function wasn´t created by the time I called it after all.

fill_functional <- function(df,node) {
  node <- as.character(node)
  if (is.leaf(df,node)==TRUE){
    return (list(name = node))
  }
  else {
    new_node = df[df[,1] == node,2]
    level <- length(new_node)
    xxx <- paste0("(df,new_node[",seq(level),"])")
    lapply(xxx,function(x) eval(call(paste("fill_functional",x,sep=""))))

  }
}

Second: Using a for loop. But I only got the children of my root node.

L <- list()
fill_list <- function(df,node) {
  node <- as.character(node)
  if (is.leaf(df,node)==TRUE){
    return (list(name = node))
  }
  else {
    new_node = df[df[,1] == node,2]

    for (i in 1:length(new_node)){
      L[i] <- (fill_list(df,new_node[i]))
    }

    return (list(name = node, children = L))
  }
}

Third: Creating a function that populates a list with elements that are functions, and just changing the arguments. But I wasn't able to accomplish anything interesting, and I'm afraid I'll have the same problem as I did on my first try described above.

Primer answered 23/5, 2014 at 22:23 Comment(0)
S
12

Here is a recursive definition:

maketreelist <- function(df, root = df[1, 1]) {
  if(is.factor(root)) root <- as.character(root)
  r <- list(name = root)
  children = df[df[, 1] == root, 2]
  if(is.factor(children)) children <- as.character(children)
  if(length(children) > 0) {
    r$children <- lapply(children, maketreelist, df = df)
    }
  r
  }

canadalist <- maketreelist(df)

That produces what you desire. This function assumes that the first column of the data.frame (or matrix) you pass in contains the parent column and the second column has the child. it also takes a root parameter which allows you to specify a starting points. It will default to the first parent in the list.

But if you really are interested in playing round with trees. The igraph package might be of interest

library(igraph)
g <- graph.data.frame(df)
plot(g)

canada tree in igraph

Sine answered 23/5, 2014 at 23:10 Comment(3)
In case interested, please see this new similar post.Finochio
What if I am trying to avoid having extra children nodes in my tree?Allimportant
Solved my own question, with the purr package I changed the function to return flatten(r).Allimportant

© 2022 - 2024 — McMap. All rights reserved.