Equivalent for Stata's egen group() function
Asked Answered
C

2

9

Consider the following dataset:

df = data.frame(id = c(1,1,1,2,2,2,3,3,3), 
                time = c(1,2,3,1,2,3,1,2,3), 
                x = c(8,8,9,7,7,7,7,7,8), 
                id_x = c(1,1,2,3,3,3,4,4,5))

I want to compute id_x which identifies each unique combination of variables id and x (preferably using dplyr).

In Stata, I can do the following:

Stata
clear

input id time x
1 1 8
1 2 8
1 3 9
2 1 7
2 2 7
2 3 7
3 1 7
3 2 7
3 3 8
end

egen id_x = group(id, x)

list, separator(0)

     +----------------------+
     | id   time   x   id_x |
     |----------------------|
  1. |  1      1   8      1 |
  2. |  1      2   8      1 |
  3. |  1      3   9      2 |
  4. |  2      1   7      3 |
  5. |  2      2   7      3 |
  6. |  2      3   7      3 |
  7. |  3      1   7      4 |
  8. |  3      2   7      4 |
  9. |  3      3   8      5 |
     +----------------------+
Correction answered 21/6, 2019 at 20:57 Comment(0)
O
7

We can use dplyr::group_indices:

library(dplyr)

#df1 %>% mutate(id_xx = group_indices(.,id,x))
df1 %>% group_by(id,x) %>% mutate(id_xx = group_indices())
#> # A tibble: 9 x 5
#> # Groups:   id, x [5]
#>      id  time     x  id_x id_xx
#>   <dbl> <dbl> <dbl> <dbl> <int>
#> 1     1     1     8     1     1
#> 2     1     2     8     1     1
#> 3     1     3     9     2     2
#> 4     2     1     7     3     3
#> 5     2     2     7     3     3
#> 6     2     3     7     3     3
#> 7     3     1     7     4     4
#> 8     3     2     7     4     4
#> 9     3     3     8     5     5

Data:

df1 <-  data.frame(id = c(1,1,1,2,2,2,3,3,3), 
                time = c(1,2,3,1,2,3,1,2,3), 
                x = c(8,8,9,7,7,7,7,7,8), 
                id_x = c(1,1,2,3,3,3,4,4,5))
Ovule answered 21/6, 2019 at 21:40 Comment(3)
I tried to use group_indices(id, x)... still in the Stata mindset. Thanks!Correction
for me, only the first solution seems to work df1 %>% mutate(id_xx = group_indices(.,id,x))Correction
@SAFEX maybe different package versions.Ovule
T
3

While M-- answer was completely correct answer at the time of writing, dplyr has deprecated group_indices(), so the code is now

df1 %>% group_by(complex, palliative) %>% mutate(cplx_pal = cur_group_id())
Tripartite answered 31/8, 2020 at 20:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.