How can I measure the degree to which names are similar in r? In other words, the degree to which a fuzzy match can be made.
For example, I am working with a data frame that looks like this:
Name.1 <- c("gonzalez", "wassermanschultz", "athanasopoulos", "armato")
Name.2 <- c("gonzalezsoldevilla", "schultz", "anthanasopoulos", "strain")
df1 <- data.frame(Name.1, Name.2)
df1
Name.1 Name.2
1 gonzalez gonzalezsoldevilla
2 wassermanschultz schultz
3 athanasopoulos anthanasopoulos
4 armato strain
It is clear from the data that rows 1 and 2 are similar enough to be confident that the name is the same. Row 3 is the same name even though it is misspelled and the fourth row is completely different.
As an output, I would like to create a third column that describes the degree of similarity between the names or returns a boolean of some kind to indicate a fuzzy match can be made.