pull all elements with specific name from a nested list
Asked Answered
S

5

6

I have some archived Slack data that I am trying to get some of key message properties. I'd done this by stupidly flattening the entire list, getting a data.frame or tibble with lists nested in some cells. As this dataset gets bigger, I want to pick elements out of this list more smartly so that when this cache becomes big it doesn't take forever to create the data.frame or tibble with the elements I want.

Example where I am trying to pull everything named "type" below into a vector or flat list that I can pull in as a dataframe variable. I named the folder and message level for convenience. Anyone have model code that can help?

library(tidyverse)
    
l <- list(folder_1 = list(
  `msg_1-1` = list(type = "message",
               subtype = "channel_join",
               ts = "1585771048.000200",
               user = "UFUNNF8MA",
               text = "<@UFUNNF8MA> has joined the channel"),
  `msg_1-2` = list(type = "message",
                   subtype = "channel_purpose",
                   ts = "1585771049.000300",
                   user = "UNFUNQ8MA",
                   text = "<@UNFUNQ8MA> set the channel purpose: Talk about xyz")),
  folder_2 = list(
    `msg_2-1` = list(type = "message",
                  subtype = "channel_join",
                  ts = "1585771120.000200",
                  user = "UQKUNF8MA",
                  text = "<@UQKUNF8MA> has joined the channel")) 
)

# gets a specific element
print(l[[1]][[1]][["type"]])

# tried to get all elements named "type", but am not at the right list level to do so
print(purrr::map(l, "type"))
Streeto answered 28/10, 2020 at 18:16 Comment(10)
Try with lapply(l,function(x) x[[1]][["type"]])Varese
Thanks! That returns the type for msg_1-1 and msg_2-1, but not for msg_2-1. Is there an apply() family or purrr call that will get all three (1-1, 1-2, 2-1)?Streeto
Here's two possible approaches: A). if every message is at the same depth just purrr::flatten(l) %>% purrr::keep(~identical(.x$type, "message)) B) if messages are at varying depths check out #48083297Gause
Also try unlist(l)[grepl('.type',names(unlist(l)),fixed=T)]Varese
another alternative is purrr::map(l, ~ purrr::map(.x, "type"))Phyllode
@Varese your unlist(l)... suggestion provides something close to what I was trying to produce. The named vector is nice for debugging too, thanks!Streeto
@M.Wood Fantastic! Would you agree on posting it as solution?Varese
@AbdessabourMtk That's EXACTLY what I was looking for... did not think to nest a map() call as a mapper function inside map() !Streeto
@Varese sure! looks like I have 2 options here, should provide both for others depending on their base R v. tidyverse preference. Will do that soon. Thanks again!Streeto
@Gause thanks for that suggestion, your solution is a nice filter for the last list given that an element in it meets a criteria, I was looking for something that returned the values in one of those bottom list elements. I think combining with something like AbdessabourMtk or Duck's solution should help me to gracefully pull all the elements I'd like. Thanks for your help!Streeto
S
2

Related to those provided by @Duck & @Abdessabour Mtk yesterday, purrr has a function map_depth() that will let you get a named attribute if you know its name and how deep it is in the hierarchy. REALLY useful when crawling this big nested lists, and is a simpler solution to the nested map() calls above.

purrr::map_depth(l, 2, "type")
Streeto answered 29/10, 2020 at 19:31 Comment(0)
F
5

Depending on the desired output, I would probably use a simple recursive function here.

get_elements <- function(x, element) {
  if(is.list(x))
  {
    if(element %in% names(x)) x[[element]]
    else lapply(x, get_elements, element = element)
  }
}

This allows:

get_elements(l, "type")
#> $folder_1
#> $folder_1$`msg_1-1`
#> [1] "message"
#> 
#> $folder_1$`msg_1-2`
#> [1] "message"
#> 
#> 
#> $folder_2
#> $folder_2$`msg_2-1`
#> [1] "message"

Or if you want to get all "users":

get_elements(l, "user")
#> $folder_1
#> $folder_1$`msg_1-1`
#> [1] "UFUNNF8MA"
#> 
#> $folder_1$`msg_1-2`
#> [1] "UNFUNQ8MA"
#> 
#> 
#> $folder_2
#> $folder_2$`msg_2-1`
#> [1] "UQKUNF8MA"

You could obviously unlist the result if you prefer it flattened into a vector.

unlist(get_elements(l, "type"))
#> folder_1.msg_1-1 folder_1.msg_1-2 folder_2.msg_2-1 
#>        "message"        "message"        "message" 
Fuegian answered 28/10, 2020 at 18:35 Comment(1)
you need to change the lapply function argument from get_type to get_elementsPhyllode
V
2

As OP mentioned, this can solve the issue:

#Code
unlist(l)[grepl('.type',names(unlist(l)),fixed=T)]

Output:

folder_1.msg_1-1.type folder_1.msg_1-2.type folder_2.msg_2-1.type 
            "message"             "message"             "message" 

Another options are (Many thanks and credit to @Abdessabour Mtk)

#Code1
purrr::map(l, ~ purrr::map(.x, "type"))
Varese answered 28/10, 2020 at 18:45 Comment(1)
@AbdessabourMkt , if I were ty try to access a level below type (were it to exist), how would I do that? Thanks!Streeto
P
2

Another option is to use rrapply() in the rrapply-package:

library(rrapply)

## return unlisted vector
rrapply(l, condition = function(x, .xname) .xname == "type", how = "unlist")
#> folder_1.msg_1-1.type folder_1.msg_1-2.type folder_2.msg_2-1.type 
#>             "message"             "message"             "message"

## return melted data.frame
rrapply(l, condition = function(x, .xname) .xname == "type", how = "melt")
#>         L1      L2   L3   value
#> 1 folder_1 msg_1-1 type message
#> 2 folder_1 msg_1-2 type message
#> 3 folder_2 msg_2-1 type message
Pervasive answered 29/10, 2020 at 7:57 Comment(0)
S
2

Related to those provided by @Duck & @Abdessabour Mtk yesterday, purrr has a function map_depth() that will let you get a named attribute if you know its name and how deep it is in the hierarchy. REALLY useful when crawling this big nested lists, and is a simpler solution to the nested map() calls above.

purrr::map_depth(l, 2, "type")
Streeto answered 29/10, 2020 at 19:31 Comment(0)
C
0

Alright I wanted a base R solution, and wasn't satisfied with the @Allan Cameron's answer as I wanted something where all matches are grouped together in a final list at the same 'root' level. I didn't want to use unlist to do so, as I want the matched object to be potentially complex table, and don't want to loose there structure. I though that append may do the trick... and after playing a bit with that I think I got something that seemss to work (at list in my and OP's case):

I used Allan names:

get_elements <- function(x, element) {
    newlist=list()
    for(elt in names(x)){
        if(elt == element) newlist=append(newlist,x[elt])
        else if(is.list(x[[elt]])) newlist=append(newlist,get_elements(x[[elt]],element) )
    }
    return(newlist)
}

Less elegant than a lapply (to my taste) but I am not sure I could do what I want with any *apply function... Although I still feel something even simpler and nicer could be done (maybe with do.call?) but can't find it...

Results with OP's list:

> get_elements(l,"user")                                                                                                                                                                                                                   
$user
[1] "UFUNNF8MA"

$user
[1] "UNFUNQ8MA"

$user
[1] "UQKUNF8MA"

> get_elements(l,"type")
$type
[1] "message"

$type
[1] "message"

$type
[1] "message"
Clubbable answered 7/11 at 22:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.