Compare if two dataframe objects in R are equal?
Asked Answered
C

5

62

How do I check if two objects, e.g. dataframes, are value equal in R?

By value equal, I mean the value of each row of each column of one dataframe is equal to the value of the corresponding row and column in the second dataframe.

Cruller answered 14/5, 2012 at 22:59 Comment(4)
?all.equal or ?identical? If its not those two then you'll have to expand on your question so we know what exactly you're trying to compare.Allerus
Have a look HEREDoityourself
what do you mean by "value equal"Changsha
I voted to close because it is too vague to answer in its current state.Bacolod
C
73

It is not clear what it means to test if two data frames are "value equal" but to test if the values are the same, here is an example of two non-identical dataframes with equal values:

a <- data.frame(x = 1:10)
b <- data.frame(y = 1:10)

To test if all values are equal:

all(a == b) # TRUE

To test if objects are identical (they are not, they have different column names):

identical(a,b) # FALSE: class, colnames, rownames must all match.
Changsha answered 14/5, 2012 at 23:50 Comment(5)
In case anyone's confused, they aren't identical because the column names aren't the same.Twitter
@Twitter thanks for pointing that out, I have clarified my answer.Changsha
Note that for identical to return true not just values and column names must match, but row numbers/names too. (This hit me when using subset(); it turned out all was what I wanted.)Cassaundra
@DavidLeBauer is there a way to make identical ignore the order?Dermal
@user4050 the order of what? the order of values? You could sort both vectors like identical(sort(a), sort(b)).Mizzen
N
14

In addition, identical is still useful and supports the practical goal:

identical(a[, "x"], b[, "y"]) # TRUE
Natascha answered 27/2, 2014 at 4:9 Comment(0)
H
9

We can use the R package compare to test whether the names of the object and the values are the same, in just one step.

a <- data.frame(x = 1:10)
b <- data.frame(y = 1:10)

library(compare)
compare(a, b)
#FALSE [TRUE]#objects are not identical (different names), but values are the same.

In case we only care about equality of the values, we can set ignoreNames=TRUE

compare(a, b, ignoreNames=T)
#TRUE
#  dropped names

The package has additional interesting functions such as compareEqual and compareIdentical.

Hastings answered 17/6, 2016 at 3:43 Comment(0)
A
1

Here is another method using comparedf from the arsenal package.

It gives you the differences detected by variable, the variables not shared (different columns, for example), the number of observations not share as well as a summary of the overall comparison.

df1 <- data.frame(id = paste0("person", 1:3),
                  a = c("a", "b", "c"),
                  b = c(1, 3, 4))

> df1
         id     a       b 
1     person1   a       1 
2     person2   b       3
3     person3   c       4


df2 <- data.frame(id = paste0("person", 4:1),
                  a = c("c", "b", "a", "f"),
                  b = c(1, 3, 4, 4),
                  d = paste0("rn", 1:4))

> df2

        id     a     b     d

1     person4  c     1    rn1
2     person3  b     3    rn2
3     person2  a     4    rn3
4     person1  f     4    rn4


library(arsenal)
comparedf(df1, df2)

Compare Object
Function Call: 
comparedf(x = df1, y = df2)

Shared: 3 non-by variables and 3 observations.
Not shared: 1 variables and 0 observations.

Differences found in 2/3 variables compared.
0 variables compared have non-identical attributes.

There is a possibility to get a more detailed summary.

 summary(comparedf(df1, df2))

The code below will return several tables:

  • Summary of data.frames
  • Summary of overall comparison
  • Variables not shared
  • Other variables not compared
  • Observations not shared
  • Differences detected by variable
  • Differences detected
  • Non-identical attributes

Here you have more info about the package and the function.

Additionally, you can use all.equal(df1, df2) too.

[1] "Attributes: < Component “row.names”: Numeric: lengths (3, 4) differ >"
[2] "Length mismatch: comparison on first 3 components"                    
[3] "Component “id”: Lengths (3, 4) differ (string compare on first 3)"    
[4] "Component “id”: 3 string mismatches"                                  
[5] "Component “a”: Lengths (3, 4) differ (string compare on first 3)"     
[6] "Component “a”: 2 string mismatches"                                   
[7] "Component “b”: Numeric: lengths (3, 4) differ"
Announcement answered 7/6, 2022 at 8:34 Comment(0)
C
0

Without the need to rely on another package, but to compare structure (class and attributes) of two data sets:

structure_df1 <- sapply(df1, function(x) paste(class(x), attributes(x), collapse = ""))
structure_df2 <- sapply(df2, function(x) paste(class(x), attributes(x), collapse = ""))

all(structure_df1 == structure_df2)
Casefy answered 11/11, 2020 at 12:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.