Access lapply index names inside FUN
Asked Answered
S

12

218

Is there a way to get the list index name in my lapply() function?

n = names(mylist)
lapply(mylist, function(list.elem) { cat("What is the name of this list element?\n" })

I asked before if it's possible to preserve the index names in the lapply() returned list, but I still don't know if there is an easy way to fetch each element name inside the custom function. I would like to avoid to call lapply on the names themselves, I'd rather get the name in the function parameters.

Scofield answered 30/3, 2012 at 20:40 Comment(1)
There's one more trick, with attributes. See here: #4165460 which is kind of similar to what DWin has, but different. :)Expiratory
G
207

Unfortunately, lapply only gives you the elements of the vector you pass it. The usual work-around is to pass it the names or indices of the vector instead of the vector itself.

But note that you can always pass in extra arguments to the function, so the following works:

x <- list(a=11,b=12,c=13) # Changed to list to address concerns in commments
lapply(seq_along(x), function(y, n, i) { paste(n[[i]], y[[i]]) }, y=x, n=names(x))

Here I use lapply over the indices of x, but also pass in x and the names of x. As you can see, the order of the function arguments can be anything - lapply will pass in the "element" (here the index) to the first argument not specified among the extra ones. In this case, I specify y and n, so there's only i left...

Which produces the following:

[[1]]
[1] "a 11"

[[2]]
[1] "b 12"

[[3]]
[1] "c 13"

UPDATE Simpler example, same result:

lapply(seq_along(x), function(i) paste(names(x)[[i]], x[[i]]))

Here the function uses "global" variable x and extracts the names in each call.

Guernica answered 30/3, 2012 at 20:47 Comment(13)
How is the 'i' parameter initialized in the custom function?Scofield
Got it, so lapply() really applies to the elements returned by seq_along. I got confused because the custom function parameters were reordered. Usually the iterated list element is the first parameter.Scofield
Updated answer and changed first function to use y instead of x so that it is (hopefully) clearer that the function can call it's arguments anything. Also changed vector values to 11,12,13.Guernica
@RobertKubrick - Yeah, I probably tried to show too many things at once... You can name the arguments anything and have them in any order.Guernica
@DWin - I think it is correct (and applies to lists as well) ;-) ...But please prove me wrong!Guernica
I thought I had proven you wrong but on reflection I see that all of my "proofs" refer out to the variable in the calling environment. I still think using "x" in your demonstration codeis pulling in information from the global environment, though.Jemappes
@DWin - Well, the function part is function(y, n, i) { paste(n[[i]], y[[i]]) } and does not refer to x.Guernica
Why do you pass x as argument to lapply() ? If you can just refer to x inside of the anonymous function?Latterll
If the names of the objects in the resulting list are to be preserved as in the original (instead of [[1]] show "a" in the output) in a one-line procedure is possible?Declivous
@Tommy, I used this to construct an lapply statement where I am trying to rename one column in all the dataframes in the list. The statement is - all_df = lapply(seq_along(all_df), function(i){nm = substr(names(all_df)[[i]], 1, nchar(names(all_df)[[i]])-3); colnames(all_df[[i]])[length(all_df[[i]])] = paste0(nm, "_rank"); all_df[[i]]}) names(all_df) = c("acc_df", "ag_df", "ia_df", "mom_df", "noa_df", "pp_df", "roa_df"). It works fine but the names of the dataframes in the resulting list are not preserved? Suggestions?Sweitzer
I have a list of list of data.frame. This solution avoided nested loops: x <- unlist(unlist(r,F),F); Reduce(rbind,lapply(seq_along(x), function(i) cbind(caso=names(x)[i], x[[i]]))). Tks, @GuernicaExcurvate
Following @VadymB., this is more parsimonious: lapply(names(x), function(i) paste(i,x[[i]])). If you want the result to be named, then sapply(names(x), function(i) paste(i,x[[i]]),simplify = F,USE.NAMES = T).Tremann
Even simplier, just iterate over the names and use list name access: lapply(names(x), function(nm) paste(nm, x[[nm]])) instead of element index.Bluenose
M
63

This basically uses the same workaround as Tommy, but with Map(), there's no need to access global variables which store the names of list components.

> x <- list(a=11, b=12, c=13)
> Map(function(x, i) paste(i, x), x, names(x))
$a
[1] "a 11"

$b
[1] "b 12"

$c
[1] "c 13

Or, if you prefer mapply()

> mapply(function(x, i) paste(i, x), x, names(x))
     a      b      c 
"a 11" "b 12" "c 13"
Mas answered 12/12, 2013 at 14:48 Comment(2)
This is definitely the best solution of the bunch.Laurielaurier
When using mapply(), notice the SIMPLIFY option, which defaults to true. In my case, that made the whole thing into a large matrix when I only wanted to a simple list apply. Setting it to F (inside the mapply()) made it run as intended.Quietism
A
44

UPDATE for R version 3.2

Disclaimer: this is a hacky trick, and may stop working in the the next releases.

You can get the index using this:

> lapply(list(a=10,b=20), function(x){parent.frame()$i[]})
$a
[1] 1

$b
[1] 2

Note: the [] is required for this to work, as it tricks R into thinking that the symbol i (residing in the evaluation frame of lapply) may have more references, thus activating the lazy duplication of it. Without it, R will not keep separated copies of i:

> lapply(list(a=10,b=20), function(x){parent.frame()$i})
$a
[1] 2

$b
[1] 2

Other exotic tricks can be used, like function(x){parent.frame()$i+0} or function(x){--parent.frame()$i}.

Performance Impact

Will the forced duplication cause performance loss? Yes! here are the benchmarks:

> x <- as.list(seq_len(1e6))

> system.time( y <- lapply(x, function(x){parent.frame()$i[]}) )
user system elapsed
2.38 0.00 2.37
> system.time( y <- lapply(x, function(x){parent.frame()$i[]}) )
user system elapsed
2.45 0.00 2.45
> system.time( y <- lapply(x, function(x){parent.frame()$i[]}) )
user system elapsed
2.41 0.00 2.41
> y[[2]]
[1] 2

> system.time( y <- lapply(x, function(x){parent.frame()$i}) )
user system elapsed
1.92 0.00 1.93
> system.time( y <- lapply(x, function(x){parent.frame()$i}) )
user system elapsed
2.07 0.00 2.09
> system.time( y <- lapply(x, function(x){parent.frame()$i}) )
user system elapsed
1.89 0.00 1.89
> y[[2]]
[1] 1000000

Conclusion

This answer just shows that you should NOT use this... Not only your code will be more readable if you find another solution like Tommy's above, and more compatible with future releases, you also risk losing the optimizations the core team has worked hard to develop!


Old versions' tricks, no longer working:

> lapply(list(a=10,b=10,c=10), function(x)substitute(x)[[3]])

Result:

$a
[1] 1

$b
[1] 2

$c
[1] 3

Explanation: lapply creates calls of the form FUN(X[[1L]], ...), FUN(X[[2L]], ...) etc. So the argument it passes is X[[i]] where i is the current index in the loop. If we get this before it's evaluated (i.e., if we use substitute), we get the unevaluated expression X[[i]]. This is a call to [[ function, with arguments X (a symbol) and i (an integer). So substitute(x)[[3]] returns precisely this integer.

Having the index, you can access the names trivially, if you save it first like this:

L <- list(a=10,b=10,c=10)
n <- names(L)
lapply(L, function(x)n[substitute(x)[[3]]])

Result:

$a
[1] "a"

$b
[1] "b"

$c
[1] "c"

Or using this second trick: :-)

lapply(list(a=10,b=10,c=10), function(x)names(eval(sys.call(1)[[2]]))[substitute(x)[[3]]])

(result is the same).

Explanation 2: sys.call(1) returns lapply(...), so that sys.call(1)[[2]] is the expression used as list argument to lapply. Passing this to eval creates a legitimate object that names can access. Tricky, but it works.

Bonus: a second way to get the names:

lapply(list(a=10,b=10,c=10), function(x)eval.parent(quote(names(X)))[substitute(x)[[3]]])

Note that X is a valid object in the parent frame of FUN, and references the list argument of lapply, so we can get to it with eval.parent.

Almsman answered 29/8, 2013 at 14:39 Comment(3)
The code lapply(list(a=10,b=10,c=10), function(x)substitute(x)[[3]]) is returning all to be 3. Would you explain how this 3 was chosen ? and reason for the discrepancy ? Is it equal to length of list, in this case, 3. Sorry if this is a basic question but would like to know how to apply this in a general case.Facilitation
@Anusha, indeed, that form is not working anymore... But the lapply(list(a=10,b=10,c=10), function(x)eval.parent(quote(names(X)))[substitute(x)[[3]]]) works... I'll check what's going on.Almsman
@Ferdinand.kraft, lapply(list(a=10,b=10,c=10), function(x)eval.parent(quote(names(X)))[substitute(x)[[3]]]) is no longer working, and gives an error, Error in eval.parent(quote(names(X)))[substitute(x)[[3]]] : invalid subscript type 'symbol' is there an easy way to fix this ?Jarrodjarrow
S
22

I've had the same problem a lot of times... I've started using another way... Instead of using lapply, I've started using mapply

n = names(mylist)
mapply(function(list.elem, names) { }, list.elem = mylist, names = n)
Serosa answered 23/4, 2015 at 20:46 Comment(1)
I also prefer this, but this answer is a duplicate of a previous one.Improvident
M
15

You could try using imap() from purrr package.

From the documentation:

imap(x, ...) is short hand for map2(x, names(x), ...) if x has names, or map2(x, seq_along(x), ...) if it does not.

So, you can use it that way :

library(purrr)
myList <- list(a=11,b=12,c=13) 
imap(myList, function(x, y) paste(x, y))

Which will give you the following result:

$a
[1] "11 a"

$b
[1] "12 b"

$c
[1] "13 c"
Melodics answered 7/11, 2017 at 19:50 Comment(0)
M
13

Just loop in the names.

sapply(names(mylist), function(n) { 
    doSomething(mylist[[n]])
    cat(n, '\n')
}
Merwin answered 15/1, 2016 at 18:28 Comment(2)
This is certainly the simplest solution.Colpotomy
@flies: yes, except it's bad practice to hard-code variable mylist inside the function. Better still to do function(mylist, nm) ...Nightwalker
J
5

Tommy's answer applies to named vectors but I got the idea you were interested in lists. And it seems as though he were doing an end-around because he was referencing "x" from the calling environment. This function uses only the parameters that were passed to the function and so makes no assumptions about the name of objects that were passed:

x <- list(a=11,b=12,c=13)
lapply(x, function(z) { attributes(deparse(substitute(z)))$names  } )
#--------
$a
NULL

$b
NULL

$c
NULL
#--------
 names( lapply(x, function(z) { attributes(deparse(substitute(z)))$names  } ))
#[1] "a" "b" "c"
 what_is_my_name <- function(ZZZ) return(deparse(substitute(ZZZ)))
 what_is_my_name(X)
#[1] "X"
what_is_my_name(ZZZ=this)
#[1] "this"
 exists("this")
#[1] FALSE
Jemappes answered 30/3, 2012 at 21:39 Comment(3)
Your function only returns NULL?! So lapply(x, function(x) NULL) gives the same answer...Guernica
Note that lapply always adds the names from x to the result afterwards.Guernica
Yes. Agree that is the lesson of this exercise.Jemappes
B
4

My answer goes in the same direction as Tommy's and caracals, but avoids having to save the list as an additional object.

lapply(seq(3), function(i, y=list(a=14,b=15,c=16)) { paste(names(y)[[i]], y[[i]]) })

Result:

[[1]]
[1] "a 14"

[[2]]
[1] "b 15"

[[3]]
[1] "c 16"

This gives the list as a named argument to FUN (instead to lapply). lapply only has to iterate over the elements of the list (be careful to change this first argument to lapply when changing the length of the list).

Note: Giving the list directly to lapply as an additional argument also works:

lapply(seq(3), function(i, y) { paste(names(y)[[i]], y[[i]]) }, y=list(a=14,b=15,c=16))
Bate answered 4/11, 2014 at 9:41 Comment(0)
E
3

Both @caracals and @Tommy are good solutions and this is an example including list´s and data.frame´s.
r is a list of list´s and data.frame´s (dput(r[[1]] at the end).

names(r)
[1] "todos"  "random"
r[[1]][1]
$F0
$F0$rst1
   algo  rst  prec  rorac prPo pos
1  Mean 56.4 0.450 25.872 91.2 239
6  gbm1 41.8 0.438 22.595 77.4 239
4  GAM2 37.2 0.512 43.256 50.0 172
7  gbm2 36.8 0.422 18.039 85.4 239
11 ran2 35.0 0.442 23.810 61.5 239
2  nai1 29.8 0.544 52.281 33.1 172
5  GAM3 28.8 0.403 12.743 94.6 239
3  GAM1 21.8 0.405 13.374 68.2 239
10 ran1 19.4 0.406 13.566 59.8 239
9  svm2 14.0 0.385  7.692 76.2 239
8  svm1  0.8 0.359  0.471 71.1 239

$F0$rst5
   algo  rst  prec  rorac prPo pos
1  Mean 52.4 0.441 23.604 92.9 239
7  gbm2 46.4 0.440 23.200 83.7 239
6  gbm1 31.2 0.416 16.421 79.5 239
5  GAM3 28.8 0.403 12.743 94.6 239
4  GAM2 28.2 0.481 34.815 47.1 172
11 ran2 26.6 0.422 18.095 61.5 239
2  nai1 23.6 0.519 45.385 30.2 172
3  GAM1 20.6 0.398 11.381 75.7 239
9  svm2 14.4 0.386  8.182 73.6 239
10 ran1 14.0 0.390  9.091 64.4 239
8  svm1  6.2 0.370  3.584 72.4 239

The objective is to unlist all lists, putting the sequence of list´s names as a columns to identify the case.

r=unlist(unlist(r,F),F)
names(r)
[1] "todos.F0.rst1"  "todos.F0.rst5"  "todos.T0.rst1"  "todos.T0.rst5"  "random.F0.rst1" "random.F0.rst5"
[7] "random.T0.rst1" "random.T0.rst5"

Unlist the lists but not the data.frame ´s.

ra=Reduce(rbind,Map(function(x,y) cbind(case=x,y),names(r),r))

Map puts the sequence of names as a column. Reduce join all data.frame´s.

head(ra)
            case algo  rst  prec  rorac prPo pos
1  todos.F0.rst1 Mean 56.4 0.450 25.872 91.2 239
6  todos.F0.rst1 gbm1 41.8 0.438 22.595 77.4 239
4  todos.F0.rst1 GAM2 37.2 0.512 43.256 50.0 172
7  todos.F0.rst1 gbm2 36.8 0.422 18.039 85.4 239
11 todos.F0.rst1 ran2 35.0 0.442 23.810 61.5 239
2  todos.F0.rst1 nai1 29.8 0.544 52.281 33.1 172

P.S. r[[1]]:

    structure(list(F0 = structure(list(rst1 = structure(list(algo = c("Mean", 
    "gbm1", "GAM2", "gbm2", "ran2", "nai1", "GAM3", "GAM1", "ran1", 
    "svm2", "svm1"), rst = c(56.4, 41.8, 37.2, 36.8, 35, 29.8, 28.8, 
    21.8, 19.4, 14, 0.8), prec = c(0.45, 0.438, 0.512, 0.422, 0.442, 
    0.544, 0.403, 0.405, 0.406, 0.385, 0.359), rorac = c(25.872, 
    22.595, 43.256, 18.039, 23.81, 52.281, 12.743, 13.374, 13.566, 
    7.692, 0.471), prPo = c(91.2, 77.4, 50, 85.4, 61.5, 33.1, 94.6, 
    68.2, 59.8, 76.2, 71.1), pos = c(239L, 239L, 172L, 239L, 239L, 
    172L, 239L, 239L, 239L, 239L, 239L)), .Names = c("algo", "rst", 
    "prec", "rorac", "prPo", "pos"), row.names = c(1L, 6L, 4L, 7L, 
    11L, 2L, 5L, 3L, 10L, 9L, 8L), class = "data.frame"), rst5 = structure(list(
        algo = c("Mean", "gbm2", "gbm1", "GAM3", "GAM2", "ran2", 
        "nai1", "GAM1", "svm2", "ran1", "svm1"), rst = c(52.4, 46.4, 
        31.2, 28.8, 28.2, 26.6, 23.6, 20.6, 14.4, 14, 6.2), prec = c(0.441, 
        0.44, 0.416, 0.403, 0.481, 0.422, 0.519, 0.398, 0.386, 0.39, 
        0.37), rorac = c(23.604, 23.2, 16.421, 12.743, 34.815, 18.095, 
        45.385, 11.381, 8.182, 9.091, 3.584), prPo = c(92.9, 83.7, 
        79.5, 94.6, 47.1, 61.5, 30.2, 75.7, 73.6, 64.4, 72.4), pos = c(239L, 
        239L, 239L, 239L, 172L, 239L, 172L, 239L, 239L, 239L, 239L
        )), .Names = c("algo", "rst", "prec", "rorac", "prPo", "pos"
    ), row.names = c(1L, 7L, 6L, 5L, 4L, 11L, 2L, 3L, 9L, 10L, 8L
    ), class = "data.frame")), .Names = c("rst1", "rst5")), T0 = structure(list(
        rst1 = structure(list(algo = c("Mean", "ran1", "GAM1", "GAM2", 
        "gbm1", "svm1", "nai1", "gbm2", "svm2", "ran2"), rst = c(22.6, 
        19.4, 13.6, 10.2, 9.6, 8, 5.6, 3.4, -0.4, -0.6), prec = c(0.478, 
        0.452, 0.5, 0.421, 0.423, 0.833, 0.429, 0.373, 0.355, 0.356
        ), rorac = c(33.731, 26.575, 40, 17.895, 18.462, 133.333, 
        20, 4.533, -0.526, -0.368), prPo = c(34.4, 52.1, 24.3, 40.7, 
        37.1, 3.1, 14.4, 53.6, 54.3, 116.4), pos = c(195L, 140L, 
        140L, 140L, 140L, 195L, 195L, 140L, 140L, 140L)), .Names = c("algo", 
        "rst", "prec", "rorac", "prPo", "pos"), row.names = c(1L, 
        9L, 3L, 4L, 5L, 7L, 2L, 6L, 8L, 10L), class = "data.frame"), 
        rst5 = structure(list(algo = c("gbm1", "ran1", "Mean", "GAM1", 
        "GAM2", "svm1", "nai1", "svm2", "gbm2", "ran2"), rst = c(17.6, 
        16.4, 15, 12.8, 9, 6.2, 5.8, -2.6, -3, -9.2), prec = c(0.466, 
        0.434, 0.435, 0.5, 0.41, 0.8, 0.44, 0.346, 0.345, 0.337), 
            rorac = c(30.345, 21.579, 21.739, 40, 14.754, 124, 23.2, 
            -3.21, -3.448, -5.542), prPo = c(41.4, 54.3, 35.4, 22.9, 
            43.6, 2.6, 12.8, 57.9, 62.1, 118.6), pos = c(140L, 140L, 
            195L, 140L, 140L, 195L, 195L, 140L, 140L, 140L)), .Names = c("algo", 
        "rst", "prec", "rorac", "prPo", "pos"), row.names = c(5L, 
        9L, 1L, 3L, 4L, 7L, 2L, 8L, 6L, 10L), class = "data.frame")), .Names = c("rst1", 
    "rst5"))), .Names = c("F0", "T0"))
Excurvate answered 1/5, 2018 at 21:27 Comment(0)
S
1

Let's say we want to calculate length of each element.

mylist <- list(a=1:4,b=2:9,c=10:20)
mylist

$a
[1] 1 2 3 4

$b
[1] 2 3 4 5 6 7 8 9

$c
 [1] 10 11 12 13 14 15 16 17 18 19 20

If the aim is to just label the resulting elements, then lapply(mylist,length) or below works.

sapply(mylist,length,USE.NAMES=T)

 a  b  c 
 4  8 11 

If the aim is to use the label inside the function, then mapply() is useful by looping over two objects; the list elements and list names.

fun <- function(x,y) paste0(length(x),"_",y)
mapply(fun,mylist,names(mylist))

     a      b      c 
 "4_a"  "8_b" "11_c" 
Squirt answered 24/7, 2019 at 18:17 Comment(0)
M
1

@ferdinand-kraft gave us a great trick and then tells us we shouldn't use it because it's undocumented and because of the performance overhead.

I can't argue much with the first point but I'd like to note that the overhead should rarely be a concern.

let's define active functions so we don't have to call the complex expression parent.frame()$i[] but only .i(), We will also create .n() to access the name, which should work for both base and purrr functionals (and probably most others as well).

.i <- function() parent.frame(2)$i[]
# looks for X OR .x to handle base and purrr functionals
.n <- function() {
  env <- parent.frame(2)
  names(c(env$X,env$.x))[env$i[]]
}

sapply(cars, function(x) paste(.n(), .i()))
#>     speed      dist 
#> "speed 1"  "dist 2"

Now let's benchmark a simple function Which pastes the items of a vector to their index, using different approaches (this operations can of course be vectorized using paste(vec, seq_along(vec)) but that's not the point here).

We define a benchmarking function and a plotting function and plot the results below :

library(purrr)
library(ggplot2)
benchmark_fun <- function(n){
  vec <- sample(letters,n, replace = TRUE)
  mb <- microbenchmark::microbenchmark(unit="ms",
                                      lapply(vec, function(x)  paste(x, .i())),
                                      map(vec, function(x) paste(x, .i())),
                                      lapply(seq_along(vec), function(x)  paste(vec[[x]], x)),
                                      mapply(function(x,y) paste(x, y), vec, seq_along(vec), SIMPLIFY = FALSE),
                                      imap(vec, function(x,y)  paste(x, y)))
  cbind(summary(mb)[c("expr","mean")], n = n)
}

benchmark_plot <- function(data, title){
  ggplot(data, aes(n, mean, col = expr)) + 
    geom_line() +
    ylab("mean time in ms") +
    ggtitle(title) +
    theme(legend.position = "bottom",legend.direction = "vertical")
}

plot_data <- map_dfr(2^(0:15), benchmark_fun)
benchmark_plot(plot_data[plot_data$n <= 100,], "simplest call for low n")

benchmark_plot(plot_data,"simplest call for higher n")

Created on 2019-11-15 by the reprex package (v0.3.0)

The drop at the start of the first chart is a fluke, please ignore it.

We see that the chosen answer is indeed faster, and for a decent amount of iterations our .i() solutions are indeed slower, the overhead compared to the chosen answer is about 3 times the overhead of using purrr::imap(), and amount to about, 25 ms for 30k iterations, so I lose about 1 ms per 1000 iterations, 1 sec per million. That's a small cost for convenience in my opinion.

Meakem answered 15/11, 2019 at 10:43 Comment(0)
C
-2

Just write your own custom lapply function

lapply2 <- function(X, FUN){
  if( length(formals(FUN)) == 1 ){
    # No index passed - use normal lapply
    R = lapply(X, FUN)
  }else{
    # Index passed
    R = lapply(seq_along(X), FUN=function(i){
      FUN(X[[i]], i)
    })
  }

  # Set names
  names(R) = names(X)
  return(R)
}

Then use like this:

lapply2(letters, function(x, i) paste(x, i))
Censure answered 17/8, 2016 at 15:56 Comment(1)
this is not robust at all, use with cautionMeakem

© 2022 - 2024 — McMap. All rights reserved.