Efficient way to find manager's manager's id

I have a database of employees, with their manager's id, in long format (one row per employee per month). I would like to add a column that contains their manager's manager's id (or the id of their skip level manager).

Here is a toy dataset:

id <- c(seq.int(1,11), seq.int(2,12))
mgr_id <- as.integer(c(NA, 1, 1, 2, 2, 2, 2, 3, 3, 5, 5,   #period 1
                      NA, 2, 5, 2, 5, 5, 3, 3, 5, 10, 10)) #period 2
period <- c(rep(1, 11), rep(2, 11))
left_company <- c(1, rep(0, 21))
joined_company <- c(rep(0, 21), 1) 

df <- data.frame(id, mgr_id, period, left_company, joined_company)

And here is a function I wrote that returns the expected results.

# finds the employee's manager in the correct period, and returns that manager's id

    get_mgr_mgr_id <- function(manager_id, period){
      mgr_mgr_id <- df$mgr_id[df$id == manager_id & df$period == period] 
      return(mgr_mgr_id[1])
    }

When I use the function with mapply, all is well. Note that employee 1 left the company, and they were replaced by employee 5, who was replaced by employee 10, who was replaced by employee 12, a new hire.

df$mgr_mgr_id <- mapply(get_mgr_mgr_id, df$mgr_id, df$period)

df
   id mgr_id period left joined mgr_mgr_id
1   1     NA      1    1      0         NA
2   2      1      1    0      0         NA
3   3      1      1    0      0         NA
4   4      2      1    0      0          1
5   5      2      1    0      0          1
6   6      2      1    0      0          1
7   7      2      1    0      0          1
8   8      3      1    0      0          1
9   9      3      1    0      0          1
10 10      5      1    0      0          2
11 11      5      1    0      0          2
12  2     NA      2    0      0         NA
13  3      2      2    0      0         NA
14  4      5      2    0      0          2
15  5      2      2    0      0         NA
16  6      5      2    0      0          2
17  7      5      2    0      0          2
18  8      3      2    0      0          2
19  9      3      2    0      0          2
20 10      5      2    0      0          2
21 11     10      2    0      0          5
22 12     10      2    0      1          5

My question: is there is a more efficient way to get this result? Currently it takes quite a long time to run even on 10,000 rows, and my dataset has closer to a million.

I'm also open to suggestions on a more general question title (possibly a version of this SQL question: Most efficient way to find something recursively in a table?)

library(data.table) setDT(df) df[, m2id := df[.(id = mgr_id, period = period), on=c("id", "period"), mgr_id]] id mgr_id period left_company joined_company m2id 1: 1 NA 1 1 0 NA 2: 2 1 1 0 0 NA 3: 3 1 1 0 0 NA 4: 4 2 1 0 0 1 5: 5 2 1 0 0 1 6: 6 2 1 0 0 1 7: 7 2 1 0 0 1 8: 8 3 1 0 0 1 9: 9 3 1 0 0 1 10: 10 5 1 0 0 2 11: 11 5 1 0 0 2 12: 2 NA 2 0 0 NA 13: 3 2 2 0 0 NA 14: 4 5 2 0 0 2 15: 5 2 2 0 0 NA 16: 6 5 2 0 0 2 17: 7 5 2 0 0 2 18: 8 3 2 0 0 2 19: 9 3 2 0 0 2 20: 10 5 2 0 0 2 21: 11 10 2 0 0 5 22: 12 10 2 0 1 5 id mgr_id period left_company joined_company m2id

Recommended topics

Hot tags