How can I find the path of an element in a nested list without manually digging through a list in a View
?
Here is an example that I can already deal with:
l1 <- list(x = list(a = "no_match", b = "test_noname", c ="test_noname"),
y = list(a = "test_name"))
After looking for an off-the-shelf solution in other packages, I found
this approach (strongly inspired by
rlist::list.search
):
list_search <- function(l, f) {
ulist <- unlist(l, recursive = TRUE, use.names = TRUE)
match <- f(ulist)
ulist[match]
}
list_search(l1, f = \(x) x == "test_noname")
x.b x.c
"test_noname" "test_noname"
This works pretty well as it’s easy to understand that the name “x.b” here can be translated for access like this:
l1[["x"]][["b"]]
[1] "test_noname"
# Or
purrr::pluck(l1, "x", "b")
[1] "test_noname"
And I can get all elements on the same level, by leaving out the last level index:
l1[["x"]]
$a
[1] "no_match"
$b
[1] "test_noname"
$c
[1] "test_noname"
This is usually my goal, as I know the values/name of one of the elements I want to get to and other similar elements are placed on the same sub-level (or sub-sub-sub-sub-sub-sub-sub-level).
However, many JSON files on the internet are not quite meant for easy consumption and parse into much more complicated lists, that look more like this:
l2 <- list(x = list("no_match", list("test_noname1", "test_noname2")), y = list(a = "test_name"))
str(l2)
List of 2
$ x:List of 2
..$ : chr "no_match"
..$ :List of 2
.. ..$ : chr "test_noname1"
.. ..$ : chr "test_noname2"
$ y:List of 1
..$ a: chr "test_name"
list_search(l2, f = \(x) x == "test_noname1")
x2
"test_noname1"
From the resulting names, I would guess that the element “x2” can be accessed like that:
l2[["x2"]]
NULL
# or maybe
l2[["x"]][[2]]
[[1]]
[1] "test_noname1"
[[2]]
[1] "test_noname2"
But to not also rake in “test_noname2” here, I actually need something like this:
l2[["x"]][[2]][[1]]
[1] "test_noname1"
Background
I often need to find the path of a known value when getting data through webscraping. The I might have a user named or URL that I know is somewhere in the data, but it's tedious to actually find it. Once one value is identified, it becomes easy to generalise to it's siblings, which are unknown so far. In the toy example, this would look like this:
l2[["x"]][[2]]
[[1]]
[1] "test_noname1"
[[2]]
[1] "test_noname2"
Only in reality, the lists I'm working with are nested much deeper.
So the issue is essentially unnamed elements in the list, that are not
assigned names which are easy to generalise by unlist
, or rapply
for that matter. Ideally there would be an automated way to translate these into a pluck
call.
"test_noname1"
exists, nested somewhere within a list (derived from JSON or not, but JSON is the main culprit for deeply nested lists), how do I find it's path, i.e.,l2[["x"]][[2]][[1]]
or l2/"x"/2/1? – Defant""test_noname2"
, just"test_noname1"
, but then the last section ("and then generalise it") just re-adds the content you were previously trying to filter out? – Energumen