How to compare generic structs in C++?
Asked Answered
S

6

15

I want to compare structs in a generic way and I've done something like this (I cannot share the actual source, so ask for more details if necessary):

template<typename Data>
bool structCmp(Data data1, Data data2)
{
  void* dataStart1 = (std::uint8_t*)&data1;
  void* dataStart2 = (std::uint8_t*)&data2;
  return memcmp(dataStart1, dataStart2, sizeof(Data)) == 0;
}

This mostly works as intended, except sometimes it returns false even though the two struct instances have identical members (I've checked with eclipse debugger). After some searching I discovered that memcmp can fail due to the struct used being padded.

Is there a more proper way of comparing memory that's indifferent to padding? I'm not able to modify the structs used (they're part of an API I'm using) and the many different structs used has some differing members and thus cannot be compared individually in a generic way (to my knowledge).

Edit: I'm unfortunately stuck with C++11. Should've mentioned this earlier...

Surtax answered 5/2, 2020 at 16:22 Comment(15)
can you show an example where this fails? Padding should be the same for all instance of one type, no?Km
@idclev463035818 Padding is unspecified, you can't assume it's value and I believe it's UB to try to read it (not sure on that last part).Ascetic
@idclev463035818 Padding is in the same relative places in memory but it can have different data. It is discarded in normal uses of the struct so the compiler may not bother to zero it.Anthropomorphous
@FrançoisAndrieux you dont know the padding, but I assumed it is the same for two instances of the same typeKm
@idclev463035818 Padding is parts of an object's memory that isn't used. It's purpose is to allow certain parts of the class to be aligned correctly in memory.Ascetic
If they are just normal structs, the default operator== should work.Anthropomorphous
@FrançoisAndrieux I know, so eg struct { some_3byte_sized_type x; int y;} will most likely have padding to align y, but isnt this the same for two instances? Sorry for hijacking the discussion, maybe I should open a seperate quesitonKm
@idclev463035818 The padding has the same size. The state of the bits that constitute that padding can be anything. When you memcmp you include those padding bits in your comparison.Ascetic
@FrançoisAndrieux uh now I get it :) thanksKm
@NO_NAME it took me some time to understand your comment, but now it completely makes senseKm
I agree with Yksisarvinen ... use classes, not structs, and implement the == operator. Using memcmp is unreliable, and sooner or later you'll be dealing with some class that has to "do it a little differently from the others." It's very clean and efficient to implement that in an operator. The actual behavior will be polymorphic but the source code will be clean ... and, obvious.Fallow
you should implement ==operator, sometimes having exactly same bytes doesn't guarantee ==ness.Danille
@FrançoisAndrieux thank you. This is why I am here. I can ask stupid questions, learn from answers and comments, try to condense what I learned and in the best case help others with that. quite rare these daysKm
The casting to std::uint8_t* is pointless since the cast to void* is implicit. The type also isn't useful, since it really should be void*, not uint8_t. And C-style casts are dangerous in general.Oldfangled
@FrederikEnetorp I amended my answer. This a hacky way for aggregates.Declinatory
S
7

No, memcmp is not suitable to do this. And reflection in C++ is insufficient to do this at this point (there are going to be experimental compilers that support reflection strong enough to do this already, and might have the features you need).

Without built-in reflection, the easiest way to solve your problem is to do some manual reflection.

Take this:

struct some_struct {
  int x;
  double d1, d2;
  char c;
};

we want to do the minimal amount of work so we can compare two of these.

If we have:

auto as_tie(some_struct const& s){ 
  return std::tie( s.x, s.d1, s.d2, s.c );
}

or

auto as_tie(some_struct const& s)
-> decltype(std::tie( s.x, s.d1, s.d2, s.c ))
{
  return std::tie( s.x, s.d1, s.d2, s.c );
}

for , then:

template<class S>
bool are_equal( S const& lhs, S const& rhs ) {
  return as_tie(lhs) == as_tie(rhs);
}

does a pretty decent job.

We can extend this process to be recursive with a bit of work; instead of comparing ties, compare each element wrapped in a template, and that template's operator== recursively applies this rule (wrapping the element in as_tie to compare) unless the element already has a working ==, and handles arrays.

This will require a bit of a library (100ish lines of code?) together with writing a bit of manual per-member "reflection" data. If the number of structs you have is limited, it might be easier to write per-struct code manually.


There are probably ways to get

REFLECT( some_struct, x, d1, d2, c )

to generate the as_tie structure using horrible macros. But as_tie is simple enough. In the repetition is annoying; this is useful:

#define RETURNS(...) \
  noexcept(noexcept(__VA_ARGS__)) \
  -> decltype(__VA_ARGS__) \
  { return __VA_ARGS__; }

in this situation and many others. With RETURNS, writing as_tie is:

auto as_tie(some_struct const& s)
  RETURNS( std::tie( s.x, s.d1, s.d2, s.c ) )

removing the repetition.


Here is a stab at making it recursive:

template<class T,
  typename std::enable_if< !std::is_class<T>{}, bool>::type = true
>
auto refl_tie( T const& t )
  RETURNS(std::tie(t))

template<class...Ts,
  typename std::enable_if< (sizeof...(Ts) > 1), bool>::type = true
>
auto refl_tie( Ts const&... ts )
  RETURNS(std::make_tuple(refl_tie(ts)...))

template<class T, std::size_t N>
auto refl_tie( T const(&t)[N] ) {
  // lots of work in C++11 to support this case, todo.
  // in C++17 I could just make a tie of each of the N elements of the array?

  // in C++11 I might write a custom struct that supports an array
  // reference/pointer of fixed size and implements =, ==, !=, <, etc.
}

struct foo {
  int x;
};
struct bar {
  foo f1, f2;
};
auto refl_tie( foo const& s )
  RETURNS( refl_tie( s.x ) )
auto refl_tie( bar const& s )
  RETURNS( refl_tie( s.f1, s.f2 ) )

refl_tie(array) (fully recursive, even supports arrays-of-arrays):

template<class T, std::size_t N, std::size_t...Is>
auto array_refl( T const(&t)[N], std::index_sequence<Is...> )
  RETURNS( std::array<decltype( refl_tie(t[0]) ), N>{ refl_tie( t[Is] )... } )

template<class T, std::size_t N>
auto refl_tie( T(&t)[N] )
  RETURNS( array_refl( t, std::make_index_sequence<N>{} ) )

Live example.

Here I use a std::array of refl_tie. This is much faster than my previous tuple of refl_tie at compile time.

Also

template<class T,
  typename std::enable_if< !std::is_class<T>{}, bool>::type = true
>
auto refl_tie( T const& t )
  RETURNS(std::cref(t))

using std::cref here instead of std::tie could save on compile-time overhead, as cref is a much simpler class than tuple.

Finally, you should add

template<class T, std::size_t N, class...Ts>
auto refl_tie( T(&t)[N], Ts&&... ) = delete;

which will prevent array members from decaying to pointers and falling back on pointer-equality (which you probably don't want from arrays).

Without this, if you pass an array to a non-reflected struct in, it falls back on pointer-to-non-reflected struct refl_tie, which works and returns nonsense.

With this, you end up with a compile-time error.


Support for recursion through library types is tricky. You could std::tie them:

template<class T, class A>
auto refl_tie( std::vector<T, A> const& v )
  RETURNS( std::tie(v) )

but that doesn't support recursion through it.

Spire answered 5/2, 2020 at 16:46 Comment(6)
I'd like to pursue this type of solution with manual reflections. The code you provided doesn't seem to work with C++11. Any chance you can help me with that?Surtax
The reason this doesn't work in C++11 is the lack of trailing return type on as_tie. Starting from C++14 this is deduced automatically. You can use auto as_tie (some_struct const & s) -> decltype(std::tie(s.x, s.d1, s.d2, s.c)); in C++11. Or explicitly state the return type.Harness
@FredrikEnetorp Fixed, plus a macro that makes it easy to write. The work to get it to fully recursively work (so a struct-of-struct, where the substructs have as_tie support, automatically works) and support array members isn't detailed, but it is possible.Spire
Thank you. I did the horrible macros slightly differently, but functionally equivalent. Just one more problem. I'm trying to generalize the comparison in a separate header file and include it in various gmock test files. This results in the error message: multiple definition of `as_tie(Test1 const&)' I'm trying to inline them but cannot get it to work.Surtax
@FredrikEnetorp The inline keyword should make multiple definition errors go away. Use the [ask question] button after you get a minimal reproducible exampleSpire
@FredrikEnetorp I replaced a tuple with an array for a massive compile time speedup btw. There are also a lot of std::make_index_sequence<N>{} implementations for c++11 out there (they aren't as fast as the ones in std, because most compilers implemented the std ones using built-in language extensions)Spire
D
7

In short: Not possible in a generic way.

The problem with memcmp is that the padding may contain arbitrary data and hence the memcmp may fail. If there were a way to find out where the padding is, you could zero-out those bits and then compare the data representations, that would check for equality if the members are trivially comparable (which is not the case i.e. for std::string since two strings can contain different pointers, but the pointed two char-arrays are equal). But I know of no way to get at the padding of structs. You can try to tell your compiler to pack the structs, but this will make accesses slower and is not really guranteed to work.

The cleanest way to implement this is to compare all members. Of course this is not really possible in a generic way (until we get compile time reflections and meta classes in C++23 or later). From C++20 onward, one could generate a default operator<=> but I think this would also only be possible as a member function so, again this is not really applicable. If you are lucky and all structs you want to compare have an operator== defined, you can of course just use that. But that is not guaranteed.

EDIT: Ok, there is actually a totally hacky and somewhat generic way for aggregates. (I only wrote the conversion to tuples, those have a default comparison operator). godbolt

Declinatory answered 5/2, 2020 at 16:38 Comment(1)
Nice hack! Unfortunately, I'm stuck with C++11 so I can't use it.Surtax
K
7

You are right that padding gets in your way of comparing arbitrary types in this way.

There are measures you can take:

  • If you are in control of Data then eg gcc has __attribute__((packed)). It has impact on performance, but it might be worth to give it a try. Though, I have to admit that I dont know if packed enables you to disallow padding completely. Gcc doc says:

This attribute, attached to struct or union type definition, specifies that each member of the structure or union is placed to minimize the memory required. When attached to an enum definition, it indicates that the smallest integral type should be used.

If T is TriviallyCopyable and if any two objects of type T with the same value have the same object representation, provides the member constant value equal true. For any other type, value is false.

and further:

This trait was introduced to make it possible to determine whether a type can be correctly hashed by hashing its object representation as a byte array.

PS: I only addressed padding, but dont forget that types that can compare equal for instances with different representation in memory are by no means rare (eg std::string, std::vector and many others).

Km answered 5/2, 2020 at 16:38 Comment(2)
I like this answer. With this type trait, you can use SFINAE to use memcmp on structs with no padding and implement operator== only when needed.Pendulous
Ok, thanks. With this I can safely conclude that I need to do some manual reflections.Surtax
S
7

No, memcmp is not suitable to do this. And reflection in C++ is insufficient to do this at this point (there are going to be experimental compilers that support reflection strong enough to do this already, and might have the features you need).

Without built-in reflection, the easiest way to solve your problem is to do some manual reflection.

Take this:

struct some_struct {
  int x;
  double d1, d2;
  char c;
};

we want to do the minimal amount of work so we can compare two of these.

If we have:

auto as_tie(some_struct const& s){ 
  return std::tie( s.x, s.d1, s.d2, s.c );
}

or

auto as_tie(some_struct const& s)
-> decltype(std::tie( s.x, s.d1, s.d2, s.c ))
{
  return std::tie( s.x, s.d1, s.d2, s.c );
}

for , then:

template<class S>
bool are_equal( S const& lhs, S const& rhs ) {
  return as_tie(lhs) == as_tie(rhs);
}

does a pretty decent job.

We can extend this process to be recursive with a bit of work; instead of comparing ties, compare each element wrapped in a template, and that template's operator== recursively applies this rule (wrapping the element in as_tie to compare) unless the element already has a working ==, and handles arrays.

This will require a bit of a library (100ish lines of code?) together with writing a bit of manual per-member "reflection" data. If the number of structs you have is limited, it might be easier to write per-struct code manually.


There are probably ways to get

REFLECT( some_struct, x, d1, d2, c )

to generate the as_tie structure using horrible macros. But as_tie is simple enough. In the repetition is annoying; this is useful:

#define RETURNS(...) \
  noexcept(noexcept(__VA_ARGS__)) \
  -> decltype(__VA_ARGS__) \
  { return __VA_ARGS__; }

in this situation and many others. With RETURNS, writing as_tie is:

auto as_tie(some_struct const& s)
  RETURNS( std::tie( s.x, s.d1, s.d2, s.c ) )

removing the repetition.


Here is a stab at making it recursive:

template<class T,
  typename std::enable_if< !std::is_class<T>{}, bool>::type = true
>
auto refl_tie( T const& t )
  RETURNS(std::tie(t))

template<class...Ts,
  typename std::enable_if< (sizeof...(Ts) > 1), bool>::type = true
>
auto refl_tie( Ts const&... ts )
  RETURNS(std::make_tuple(refl_tie(ts)...))

template<class T, std::size_t N>
auto refl_tie( T const(&t)[N] ) {
  // lots of work in C++11 to support this case, todo.
  // in C++17 I could just make a tie of each of the N elements of the array?

  // in C++11 I might write a custom struct that supports an array
  // reference/pointer of fixed size and implements =, ==, !=, <, etc.
}

struct foo {
  int x;
};
struct bar {
  foo f1, f2;
};
auto refl_tie( foo const& s )
  RETURNS( refl_tie( s.x ) )
auto refl_tie( bar const& s )
  RETURNS( refl_tie( s.f1, s.f2 ) )

refl_tie(array) (fully recursive, even supports arrays-of-arrays):

template<class T, std::size_t N, std::size_t...Is>
auto array_refl( T const(&t)[N], std::index_sequence<Is...> )
  RETURNS( std::array<decltype( refl_tie(t[0]) ), N>{ refl_tie( t[Is] )... } )

template<class T, std::size_t N>
auto refl_tie( T(&t)[N] )
  RETURNS( array_refl( t, std::make_index_sequence<N>{} ) )

Live example.

Here I use a std::array of refl_tie. This is much faster than my previous tuple of refl_tie at compile time.

Also

template<class T,
  typename std::enable_if< !std::is_class<T>{}, bool>::type = true
>
auto refl_tie( T const& t )
  RETURNS(std::cref(t))

using std::cref here instead of std::tie could save on compile-time overhead, as cref is a much simpler class than tuple.

Finally, you should add

template<class T, std::size_t N, class...Ts>
auto refl_tie( T(&t)[N], Ts&&... ) = delete;

which will prevent array members from decaying to pointers and falling back on pointer-equality (which you probably don't want from arrays).

Without this, if you pass an array to a non-reflected struct in, it falls back on pointer-to-non-reflected struct refl_tie, which works and returns nonsense.

With this, you end up with a compile-time error.


Support for recursion through library types is tricky. You could std::tie them:

template<class T, class A>
auto refl_tie( std::vector<T, A> const& v )
  RETURNS( std::tie(v) )

but that doesn't support recursion through it.

Spire answered 5/2, 2020 at 16:46 Comment(6)
I'd like to pursue this type of solution with manual reflections. The code you provided doesn't seem to work with C++11. Any chance you can help me with that?Surtax
The reason this doesn't work in C++11 is the lack of trailing return type on as_tie. Starting from C++14 this is deduced automatically. You can use auto as_tie (some_struct const & s) -> decltype(std::tie(s.x, s.d1, s.d2, s.c)); in C++11. Or explicitly state the return type.Harness
@FredrikEnetorp Fixed, plus a macro that makes it easy to write. The work to get it to fully recursively work (so a struct-of-struct, where the substructs have as_tie support, automatically works) and support array members isn't detailed, but it is possible.Spire
Thank you. I did the horrible macros slightly differently, but functionally equivalent. Just one more problem. I'm trying to generalize the comparison in a separate header file and include it in various gmock test files. This results in the error message: multiple definition of `as_tie(Test1 const&)' I'm trying to inline them but cannot get it to work.Surtax
@FredrikEnetorp The inline keyword should make multiple definition errors go away. Use the [ask question] button after you get a minimal reproducible exampleSpire
@FredrikEnetorp I replaced a tuple with an array for a massive compile time speedup btw. There are also a lot of std::make_index_sequence<N>{} implementations for c++11 out there (they aren't as fast as the ones in std, because most compilers implemented the std ones using built-in language extensions)Spire
G
3

C++ 20 supports default comaparisons

#include <iostream>
#include <compare>

struct XYZ
{
    int x;
    char y;
    long z;

    auto operator<=>(const XYZ&) const = default;
};

int main()
{
    XYZ obj1 = {4,5,6};
    XYZ obj2 = {4,5,6};

    if (obj1 == obj2)
    {
        std::cout << "objects are identical\n";
    }
    else
    {
        std::cout << "objects are not identical\n";
    }
    return 0;
}
Guinevere answered 5/2, 2020 at 17:6 Comment(2)
While that is a very useful feature, it doesn't answer the question as asked. The OP did say "I'm not able to modify the structs used", which means that, even if C++20 default equality operators were available, the OP would be unable to use them since defaulting the == or <=> operators can only be done at class scope.Altimeter
Like Nicol Bolas said, I cannot modify the structs.Surtax
O
1

Assuming POD data, default assignment operator copies only member bytes. (actually not 100% sure about that, don't take my word for it)

You can use this to your advantage:

template<typename Data>
bool structCmp(Data data1, Data data2) // Data is POD
{
  Data tmp;
  memcpy(&tmp, &data1, sizeof(Data)); // copy data1 including padding
  tmp = data2;                        // copy data2 only members
  return memcmp(&tmp, &data1, sizeof(Data)) == 0; 
}
Oshea answered 5/2, 2020 at 16:45 Comment(5)
@Oldfangled You're right that was a terrible answer. Rewrote one.Oshea
Does the standard guarantee that the assignment leaves padding bytes untouched? There is also still a concern about multiple object representations for the same value in fundamental types.Oldfangled
@Oldfangled I believe it does.Oshea
The comments under the top answer in that link seem to indicate that it doesn't. The answer itself only says that the padding needn't be copied, but not that it musn't. I don't know for sure either, though.Oldfangled
I've now tested it and it doesn't work. Assignment does not leave padding bytes untouched.Surtax
L
0

I believe you may be able to base a solution on Antony Polukhin's wonderfully devious voodoo in the magic_get library - for structs, not for complex classes.

With that library, we are able to iterate the different fields of a struct, with their appropriate type, in purely-general-templated code. Antony has used this, for example, to be able to stream arbitrary structs to an output stream with the correct types, completely generically. It stands to reason that comparison might also be a possible application of this approach.

... but you would need C++14. At least it's better than the C++17 and later suggestions in other answers :-P

Lingenfelter answered 28/2, 2020 at 22:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.