Internal consistency of C structures
Asked Answered
U

5

11

If I have two C structures initialised to have identical members, can I guarantee that:

memcmp(&struct1, &struct2, sizeof(my_struct))

will always return zero?

Unfit answered 29/5, 2013 at 13:44 Comment(2)
I've created an example that shows the padding issue the other answers mention : ideone.com/NdLJG4 .Seam
Thanks Michael, that's an excellent example.Unfit
T
10

I don't think you can safely memcmp a structure to test for equality.

From C11 §6.2.6.6 Representations of types

When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.

This implies that you'd need to write a function which compares individual elements of the structure

int my_struct_equals(my_struct* s1, my_struct* s2)
{
    if (s1->intval == s2->intval &&
        strcmp(s1->strval, s2->strval) == 0 && 
        s1->binlen == s2->binlen &&
        memcmp(s1->binval, s2->binval, s1->binlen) == 0 &&
        ...
        ) {
        return 1;
    }
    return 0;
}
Tennilletennis answered 29/5, 2013 at 13:48 Comment(0)
H
9

No, two structures with all members equal may sometimes not compare equal for memcmp(), because of padding.

One plausible example is as follows. For the initialization of st2, a standard-compliant 32-bit compiler could generate a sequence of assembly instructions that leave part of the final padding uninitialized. This piece of padding will contain whatever happened to be there on the stack, whereas st1's padding will typically contain zero:

struct S { short s1; long long i; short s2; } st1 = { 1, 2, 3 };
int main() {
  struct S st2 = { 1, 2, 3 };
  ... at this point memcmp(&st1, &st2, sizeof(struct S)) could plausibly be nonzero
}
Hitt answered 29/5, 2013 at 13:47 Comment(1)
You can make a structure packed so that it won't pad any (uninitialised) bits in it.Disintegrate
T
2

If both variables are global or static, and their members were initialized at init time of the program, then yes, they will compare equal with memcmp(). (Note, most systems just load the data pages into zero initialized pages, but the C standard does not guarantee this behavior.)

Also, if one of the structures were initialized with the other using memcpy(), then they will compare equal with memcmp().

If both were initialized to some common value with memset() first before their members are initialized to the same values, then they will also compare equal with memcmp() (unless their members are also structures, then the same restrictions apply recursively).

Traceetracer answered 29/5, 2013 at 13:54 Comment(1)
I didn't downvote but I don't think C11 s6.7.9.10 backs this up the first paragraph. It reads as if initialisation of statics/globals is a series of assignments of members to 0 or NULL. If this is correct, s6.2.6.6 suggests that any padding bytes would have undefined valueTennilletennis
P
2

Beside the obvious case of struct padding, it is not even guaranteed for single variables. See the footnote for 6.2.6.1 (8):

It is possible for objects x and y with the same effective type T to have the same value when they are accessed as objects of type T, but to have different values in other contexts. In particular, if == is defined for type T, then x == y does not imply that memcmp(&x, &y, sizeof (T)) == 0. Furthermore, x == y does not necessarily imply that x and y have the same value; other operations on values of type T may distinguish between them.

Pinter answered 30/5, 2013 at 10:7 Comment(2)
This is a great point. A concrete example of this is floating point types where there are +'ve and -'ve 0 values that compare equal but have different bit patterns.Seam
Its probably worth noting that floats also provide the opposite case: i.e. take two doubles x=y=1.0/0 they compare as unequal, x!=y, but have memcmp(&x,&y,sizeof(double))==0.Seam
N
-1

You can guarantee that they're identical if you ensure that both entire memory blocks are initialised before they're populated, e.g. with memset:

memset(&struct1, 0, sizeof(my_struct))

EDIT leaving this here because the comment stream is useful.

Nonpros answered 29/5, 2013 at 13:44 Comment(13)
that's not backed by language semantics - padding bytes always take unspecified values, which in particular means that compilers are free to overwrite ambient padding when assigning to members; no idea if that happens in practice, though...Fernandina
@Christoph: Are you sure about that? If so, I don't even want to think about how many protocol stacks will just suddenly break on such a system.Traceetracer
@user315052: When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values (C11 6.2.6.1 §6)Fernandina
@Christoph: I read the footnote, though, and the intent of that phrase was to allow structure assignment to be implemented with memcpy().Traceetracer
@user315052: the footnote reads Thus, for example, structure assignment need not copy any padding bits., which is more or less the opposite of your claim, ie structure assignment can be implemented without memcpy()ing the whole blobFernandina
@Christoph: I was reading footnote 42 in C99. I don't have C11 in front of me at the moment.Traceetracer
it's easy to come up with other reasons for this restriction, eg on architectures where the size of a logically addressable unit of memory (byte) is smaller than the unit of physically addressable memory (word), ie where byte-wise access has to be emulated via shifts and such; having to keep padding intact would mean that we had to read the padding every time we want to write to a less-than-word-sized memberFernandina
@Christoph: C already allows for a char to be more than 8 bits, so I don't believe such a system would be implemented that way.Traceetracer
@user315052: maybe I want to run a POSIX system on a DCPU-16 ;); even if my example is contrieved, I do believe my point stands going by guarantees made by the standard aloneFernandina
@Christoph: Reading that phrase again, it is not clear to me if "including in a member object" means the object is a member object, or if the value is being stored to a member of the object. That is, it is not clear from the sentence if the value itself is of structure or union type. It would make sense if a structure is a member of another structure, then structure assignment to the member may affect padding to the containing structure. I am not sure if I can buy assignment to other members, since they should be governed by the rules already set out earlier in the same section.Traceetracer
@user315052 I was going to point out the same ambiguity. It's unclear whether the phrase "value stored" relates to the value stored in the entire structure (i.e. as a single assignment), or to storing values within individual members of that structure.Nonpros
@Alnitak: it's not as clear as it could be, but the gist is that storing into a structure invalidates padding, regardless of whether you access the structure as a whole via assignment (a = b) or only target a specific member (a.foo = 42), which is what the part including in a member object refers toFernandina
note that after having re-read the relevant parts of the standard, my claim that the value of padding bytes is always unspecified appears to be incorrect - it only gets invalidated by storage into the structure (assignment to the structure or its members) - if you do all your modifying manipulations byte-wise (cast to char*, memcpy(), ...), padding bytes should retain their valuesFernandina

© 2022 - 2024 — McMap. All rights reserved.