I need to add a fingerprint to each row in a dataset so to check with a later version of the same set to look for difference.
I know how to add hash for each row in R like below:
data.frame(iris,hash=apply(iris,1,digest))
I am learning to use dplyr
as the dataset is getting huge and I need to store them in SQL Server, I tried something like below but the hash is not working, all rows give the same hash:
iris %>%
rowwise() %>%
mutate(hash=digest(.))
Any clue for row-wise hashing using dplyr? Thanks!
key <- c('Sepal.Length', 'Sepal.Width', 'Petal.Length') iris %>% rowwise() %>% do(data.frame(., hash = digest::digest(.data[!!key]))) %>%
[(i=1,j='hash')
vsdigest::digest(as.character(c(iris[1,'Sepal.Length'], iris[1,'Sepal.Width'], iris[1,'Petal.Length'])))
– Trappist