I have a practical question when I have two (or more) data frames and want to assign unique IDs for each matching observation within each and across both datasets e.g.:
#1. Create dataframe df1:
a1 <- c(1, 1, 1, 1, 2, 2, 2, 2, 1, 1)
b1 <- c(1, 5, 3, 2, 3, 4, 5, 1, 5, 2)
c1 <- c("white", "red", "black", "white", "red",
"white", "black", "silver", "red", "green")
df1 <- data.frame(a1, b1, c1)
df1
a1 b1 c1
1 1 1 white
2 1 5 red
3 1 3 black
4 1 2 white
5 2 3 red
6 2 4 white
7 2 5 black
8 2 1 silver
9 1 5 red
10 1 2 green
#2. Create dataframe df2:
a2 <- c(2, 2, 1, 1, 2, 2, 2, 2, 2, 2)
b2 <- c(3, 1, 3, 2, 1, 3, 4, 5, 3, 5)
c2 <- c("black", "blue", "black", "white", "silver",
"green", "green", "red", "blue", "white")
df2 <- data.frame(a2, b2, c2)
df2
a2 b2 c2
1 2 3 black
2 2 1 blue
3 1 3 black
4 1 2 white
5 2 1 silver
6 2 3 green
7 2 4 green
8 2 5 red
9 2 3 blue
10 2 5 white
#3. Assign unique IDs to each observation in df1:
library(data.table)
df1.2 <- data.table(df1, key="a1,b1,c1")
df1.2[, id:=.GRP, by=key(df1.2)]
df1.2 <- as.data.frame(df1.2)
df1.2
a1 b1 c1 id
1 1 1 white 1
2 1 2 green 2
3 1 2 white 3
4 1 3 black 4
5 1 5 red 5
6 1 5 red 5
7 2 1 silver 6
8 2 3 red 7
9 2 4 white 8
10 2 5 black 9
#4. The problematic part!! Assign identical unique IDs to matching observations of df2 as compared to df1.2
#and assign other unique IDs to all other non-matching obs of df2.
#Name the resulting dataframe as df2.2
#My expected result will ideally look as follows:
df2.2
a2 b2 c2 id
1 2 3 black 10
2 2 1 blue 11
3 1 3 black 4
4 1 2 white 3
5 2 1 silver 6
6 2 3 green 12
7 2 4 green 13
8 2 5 red 14
9 2 3 blue 15
10 2 5 white 16
Any help on how to get to df2.2 will be very much appreciated. Thanks.