Is there an R function to sequentially assign a code to each value in a dataframe, in the order it appears within the dataset?
Asked Answered
R

3

5

I have a table with a long list of aliased values like this:

> head(transmission9, 50)
# A tibble: 50 x 2
   In_Node  End_Node
   <chr>    <chr>   
 1 c4ca4238 2838023a
 2 c4ca4238 d82c8d16
 3 c4ca4238 a684ecee
 4 c4ca4238 fc490ca4
 5 28dd2c79 c4ca4238
 6 f899139d 3def184a

I would like to have R go through both columns and assign a number sequentially to each value, in the order that an aliased value appears in the dataset. I would like R to read across rows first, then down columns. For example, for the dataset above:

   In_Node  End_Node
   <chr>    <chr>   
 1  1       2
 2  1       3
 3  1       4
 4  1       5
 5  6       1
 6  7       8

Is this possible? Ideally, I'd also love to be able to generate a "key" which would match each sequential code to each aliased value, like so:

Code Value
1    c4ca4238
2    2838023a
3    d82c8d16
4    a684ecee
5    fc490ca4

Thank you in advance for the help!

Randle answered 15/7, 2021 at 15:41 Comment(2)
unique(x) where x is a character vector will give you the unique elements ordered as they appear in xPurtenance
I suspect you'll find a more elegant solution than this but I'd approach he problem of ordering the aliases with sapply(): testm<-matrix(c(1,2,3,4, 4, 2, 1, 3), ncol =2) unique(sapply(t(testm), function(x)x))Purtenance
H
4

A dplyr version

  • Let's first re-create a sample data
library(tidyverse)

transmission9 <- read.table(header = T, text = "   In_Node  End_Node
 1 c4ca4238 283802d3a
 2 c4ca4238 d82c8d16
 3 c4ca4238 a684ecee
 4 c4ca4238 fc490ca4
 5 28dd2c79 c4ca4238
 6 f899139d 3def184a")

Do this simply

transmission9 %>% 
  mutate(across(everything(), ~ match(., unique(c(t(cur_data()))))))
#>   In_Node End_Node
#> 1       1        2
#> 2       1        3
#> 3       1        4
#> 4       1        5
#> 5       6        1
#> 6       7        8

use .names argument if you want to create new columns

transmission9 %>% 
  mutate(across(everything(), ~ match(., unique(c(t(cur_data())))),
                .names = '{.col}_code'))

   In_Node End_Node In_Node_code End_Node_code
1 c4ca4238 2838023a            1             2
2 c4ca4238 d82c8d16            1             3
3 c4ca4238 a684ecee            1             4
4 c4ca4238 fc490ca4            1             5
5 28dd2c79 c4ca4238            6             1
6 f899139d 3def184a            7             8
Halstead answered 15/7, 2021 at 16:2 Comment(2)
That’s extremely neat (especially since it avoids the IMHO quite messy structured assignment via [<-, and the unlist).Agape
thank you so much, this was so clear and understandable!Randle
H
6

You could do:

df1 <- df
df1[]<-as.numeric(factor(unlist(df), unique(c(t(df)))))
df1
  In_Node End_Node
1       1        2
2       1        3
3       1        4
4       1        5
5       6        1
6       7        8
Howlond answered 15/7, 2021 at 15:50 Comment(0)
A
5

You can match against the unique values. For a single vector, the code is straightforward:

match(vec, unique(vec))

The requirement to go across columns before rows makes this slightly tricky: you need to transpose the values first. After that, match them.

Finally, use [<- to assign the result back to a data.frame of the same shape as your original data (here x):

y = x
y[] = match(unlist(x), unique(c(t(x))))
y
  V2 V3
1  1  2
2  1  3
3  1  4
4  1  5
5  6  1
6  7  8

c(t(x)) is a bit of a hack:

  • t first converts the tibble to a matrix and then transposes it. If your tibble contains multiple data types, these will be coerced to a common type.
  • c(…) discards attributes. In particular, it drops the dimensions of the transposed matrix, i.e. it converts the matrix into a vector, with the values now in the correct order.
Agape answered 15/7, 2021 at 15:52 Comment(2)
But yours is brilliant! :) upvoted alreadyHalstead
Thank you so much for breaking down everything and explaining the code--I really appreciate it! I ran into an issue with your code saying that "assigned data must be compatible with existing data", where existing data was half the size of the assigned data. For that reason, I checked off someone else's solution which worked without the error, but your explanation was really helpful in understanding the code they provided.Randle
H
4

A dplyr version

  • Let's first re-create a sample data
library(tidyverse)

transmission9 <- read.table(header = T, text = "   In_Node  End_Node
 1 c4ca4238 283802d3a
 2 c4ca4238 d82c8d16
 3 c4ca4238 a684ecee
 4 c4ca4238 fc490ca4
 5 28dd2c79 c4ca4238
 6 f899139d 3def184a")

Do this simply

transmission9 %>% 
  mutate(across(everything(), ~ match(., unique(c(t(cur_data()))))))
#>   In_Node End_Node
#> 1       1        2
#> 2       1        3
#> 3       1        4
#> 4       1        5
#> 5       6        1
#> 6       7        8

use .names argument if you want to create new columns

transmission9 %>% 
  mutate(across(everything(), ~ match(., unique(c(t(cur_data())))),
                .names = '{.col}_code'))

   In_Node End_Node In_Node_code End_Node_code
1 c4ca4238 2838023a            1             2
2 c4ca4238 d82c8d16            1             3
3 c4ca4238 a684ecee            1             4
4 c4ca4238 fc490ca4            1             5
5 28dd2c79 c4ca4238            6             1
6 f899139d 3def184a            7             8
Halstead answered 15/7, 2021 at 16:2 Comment(2)
That’s extremely neat (especially since it avoids the IMHO quite messy structured assignment via [<-, and the unlist).Agape
thank you so much, this was so clear and understandable!Randle

© 2022 - 2024 — McMap. All rights reserved.