How do I check if two objects, e.g. dataframes, are value equal in R?
By value equal, I mean the value of each row of each column of one dataframe is equal to the value of the corresponding row and column in the second dataframe.
How do I check if two objects, e.g. dataframes, are value equal in R?
By value equal, I mean the value of each row of each column of one dataframe is equal to the value of the corresponding row and column in the second dataframe.
It is not clear what it means to test if two data frames are "value equal" but to test if the values are the same, here is an example of two non-identical dataframes with equal values:
a <- data.frame(x = 1:10)
b <- data.frame(y = 1:10)
To test if all values are equal:
all(a == b) # TRUE
To test if objects are identical (they are not, they have different column names):
identical(a,b) # FALSE: class, colnames, rownames must all match.
identical
to return true not just values and column names must match, but row numbers/names too. (This hit me when using subset(); it turned out all
was what I wanted.) –
Cassaundra identical(sort(a), sort(b))
. –
Mizzen In addition, identical is still useful and supports the practical goal:
identical(a[, "x"], b[, "y"]) # TRUE
We can use the R package compare
to test whether the names of the object and the values are the same, in just one step.
a <- data.frame(x = 1:10)
b <- data.frame(y = 1:10)
library(compare)
compare(a, b)
#FALSE [TRUE]#objects are not identical (different names), but values are the same.
In case we only care about equality of the values, we can set ignoreNames=TRUE
compare(a, b, ignoreNames=T)
#TRUE
# dropped names
The package has additional interesting functions such as compareEqual
and compareIdentical
.
Here is another method using comparedf
from the arsenal
package.
It gives you the differences detected by variable, the variables not shared (different columns, for example), the number of observations not share as well as a summary of the overall comparison.
df1 <- data.frame(id = paste0("person", 1:3),
a = c("a", "b", "c"),
b = c(1, 3, 4))
> df1
id a b
1 person1 a 1
2 person2 b 3
3 person3 c 4
df2 <- data.frame(id = paste0("person", 4:1),
a = c("c", "b", "a", "f"),
b = c(1, 3, 4, 4),
d = paste0("rn", 1:4))
> df2
id a b d
1 person4 c 1 rn1
2 person3 b 3 rn2
3 person2 a 4 rn3
4 person1 f 4 rn4
library(arsenal)
comparedf(df1, df2)
Compare Object
Function Call:
comparedf(x = df1, y = df2)
Shared: 3 non-by variables and 3 observations.
Not shared: 1 variables and 0 observations.
Differences found in 2/3 variables compared.
0 variables compared have non-identical attributes.
There is a possibility to get a more detailed summary
.
summary(comparedf(df1, df2))
The code below will return several tables:
Here you have more info about the package and the function.
Additionally, you can use all.equal(df1, df2)
too.
[1] "Attributes: < Component “row.names”: Numeric: lengths (3, 4) differ >"
[2] "Length mismatch: comparison on first 3 components"
[3] "Component “id”: Lengths (3, 4) differ (string compare on first 3)"
[4] "Component “id”: 3 string mismatches"
[5] "Component “a”: Lengths (3, 4) differ (string compare on first 3)"
[6] "Component “a”: 2 string mismatches"
[7] "Component “b”: Numeric: lengths (3, 4) differ"
Without the need to rely on another package, but to compare structure (class and attributes) of two data sets:
structure_df1 <- sapply(df1, function(x) paste(class(x), attributes(x), collapse = ""))
structure_df2 <- sapply(df2, function(x) paste(class(x), attributes(x), collapse = ""))
all(structure_df1 == structure_df2)
© 2022 - 2024 — McMap. All rights reserved.
?all.equal
or?identical
? If its not those two then you'll have to expand on your question so we know what exactly you're trying to compare. – Allerus