memset() or value initialization to zero out a struct?
Asked Answered
M

8

102

In Win32 API programming it's typical to use C structs with multiple fields. Usually only a couple of them have meaningful values and all others have to be zeroed out. This can be achieved in either of the two ways:

STRUCT theStruct;
memset( &theStruct, 0, sizeof( STRUCT ) );

or

STRUCT theStruct = {};

The second variant looks cleaner - it's a one-liner, it doesn't have any parameters that could be mistyped and lead to an error being planted.

Does it have any drawbacks compared to the first variant? Which variant to use and why?

Mellisamellisent answered 4/1, 2010 at 10:40 Comment(1)
This How to Answer[1] to a later question appears to be more useful and easier. [1]: https://mcmap.net/q/212152/-class-initialization-list/…Tragedian
M
120

Those two constructs a very different in their meaning. The first one uses a memset function, which is intended to set a buffer of memory to certain value. The second to initialize an object. Let me explain it with a bit of code:

Lets assume you have a structure that has members only of POD types ("Plain Old Data" - see What are POD types in C++?)

struct POD_OnlyStruct
{
    int a;
    char b;
};

POD_OnlyStruct t = {};  // OK

POD_OnlyStruct t;
memset(&t, 0, sizeof t);  // OK as well

In this case writing a POD_OnlyStruct t = {} or POD_OnlyStruct t; memset(&t, 0, sizeof t) doesn't make much difference, as the only difference we have here is the alignment bytes being set to zero-value in case of memset used. Since you don't have access to those bytes normally, there's no difference for you.

On the other hand, since you've tagged your question as C++, let's try another example, with member types different from POD:

struct TestStruct
{
    int a;
    std::string b;
};

TestStruct t = {};  // OK

{
    TestStruct t1;
    memset(&t1, 0, sizeof t1);  // ruins member 'b' of our struct
}  // Application crashes here

In this case using an expression like TestStruct t = {} is good, and using a memset on it will lead to crash. Here's what happens if you use memset - an object of type TestStruct is created, thus creating an object of type std::string, since it's a member of our structure. Next, memset sets the memory where the object b was located to certain value, say zero. Now, once our TestStruct object goes out of scope, it is going to be destroyed and when the turn comes to it's member std::string b you'll see a crash, as all of that object's internal structures were ruined by the memset.

So, the reality is, those things are very different, and although you sometimes need to memset a whole structure to zeroes in certain cases, it's always important to make sure you understand what you're doing, and not make a mistake as in our second example.

My vote - use memset on objects only if it is required, and use the default initialization x = {} in all other cases.

Macdonald answered 4/1, 2010 at 16:42 Comment(4)
Hi Dimity! I have a structure which has some members and I tried the first option of memsetting: "struct stVar={}". But I am getting "-Wmissing-field-initializers" warning. Is it an issue?Adaiha
In this case, by POD do you mean actually a trivially constructible object (i.e. object without user-provided c-tor)? I don't think it should be constrained to POD.Hightest
This won't crash: coliru.stacked-crooked.com/a/4b3dbf0b8761bc9b It's technically undefined behavior because the struct is not trivially assignable (hence the compiler warning). However, I doubt there's any common platform where zeroed-bytes are an invalid value for std::string.Elka
I think this answer is outdated. In C++11, padding bits are guaranteed to be zero-initialized: if T is a (possibly cv-qualified) non-union class type, each non-static data member and each base-class subobject is zero-initialized and padding is initialized to zero bits;Os
E
42

Depending on the structure members, the two variants are not necessarily equivalent. memset will set the structure to all-bits-zero whereas value initialization will initialize all members to the value zero. The C standard guarantees these to be the same only for integral types, not for floating-point values or pointers.

Also, some APIs require that the structure really be set to all-bits-zero. For instance, the Berkeley socket API uses structures polymorphically, and there it is important to really set the whole structure to zero, not just the values that are apparent. The API documentation should say whether the structure really needs to be all-bits-zero, but it might be deficient.

But if neither of these, or a similar case, applies, then it's up to you. I would, when defining the structure, prefer value initialization, as that communicates the intent more clearly. Of course, if you need to zeroize an existing structure, memset is the only choice (well, apart from initializing each member to zero by hand, but that wouldn't normally be done, especially for large structures).

Eucalyptol answered 4/1, 2010 at 10:54 Comment(8)
out of curiosity, on which platform a float with all bits to zeros is not the positive zero?Davon
Several old pre-IEEE-754 CPUs had strange float zeros. Non-754 math might come back yet, you never know, so it's better not to write those bugs.Methane
Doesn't matter. The C standard doesn't specify what float format is used. So even if it works now for IEEE 754, it might not work on a different float implementation (future or past)Briscoe
I would guess not many nowadays, with IEEE being so common, but they used to be more prevalent. I understand software FP implementations are typical examples where zero was not all-bits-zero. So you probably won't get in trouble, but still, C doesn't mandate IEEE, so unless the zero initialization is a bottleneck, the "safer" way doesn't really cost anything.Eucalyptol
Initializing each member to zero will not make every member to zero, but you'll miss the padding bytes. Therefore memset is your only choice.Mannie
@AndrewMcGregor: Some systems may also use something other than all-bits-zero for a null pointer. For example, if a 32-bit system traps reads or writes of invalid memory addresses, and all-bits-zero is valid but memory addresses with the MSB set aren't, it might be better to use 0x80000000 or 0xC0000000 rather than 0x0000000 as NULL [if a system used 0xFFFFFFFF, then accessing p->bar when p is null might refer to a seemingly-valid address, but using 0xC0000000 would guard against that].Leavy
"Of course, if you need to zeroize an existing structure, memset is the only choice (well, apart from initializing each member to zero by hand, but that wouldn't normally be done, especially for large structures)". What prevents from doing the same as in init? Works fine: ideone.com/AzRQWCShawana
Learned my lesson the hard way with the Berkeley socket API...Diablerie
C
14

If your struct contains things like :

int a;
char b;
int c;

Then bytes of padding will be inserted between b and c. memset will zero those, the other way will not, so there will be 3 bytes of garbage (if your ints are 32 bits). If you intend to use your struct to read/write from a file, this might be important.

Clingfish answered 4/1, 2010 at 12:3 Comment(2)
This doesn't appear to be true. From CppReference: "If T is an non-union class type, all base classes and non-static data members are zero-initialized, and all padding is initialized to zero bits. The constructors, if any, are ignored." en.cppreference.com/w/cpp/language/zero_initializationElka
Probably only applies to C and not C++.Zetta
D
9

I would use value initialization because it looks clean and less error prone as you mentioned. I don't see any drawback in doing it.

You might rely on memset to zero out the struct after it has been used though.

Davon answered 4/1, 2010 at 10:43 Comment(1)
The drawback is needing to remember to add another zero'ing every time you add a new memberAllpowerful
B
6

Not that it's common, but I guess the second way also has the benefit of initializing floats to zero, while doing a memset would certainly not.

Briscoe answered 4/1, 2010 at 10:51 Comment(2)
while doing a memset would certainly not - not entirely true. Actually, on x86 and x64 memsetting a float/double to zero will set it to zero. Sure, this is not in the C/C++ standard, but it works on the most popular platforms.Tilney
sbk: for now... who knows what floating point implementation they might start using. IEEE 754 is not defined for the compiler. So even if it might work now, it is just lucky for you, but can give problems later.Briscoe
B
6

The value initialization is prefered because it can be done at compile time.
Also it correctly 0 initializes all POD types.

The memset is done at runtime.
Also using memset is suspect if the struct is not POD.
Does not correctly initialize (to zero) non int types.

Berners answered 4/1, 2010 at 10:52 Comment(7)
Values are not initialised at compile time. Compiler generates startup code that initialise all globals during program start thus at runtime. For stack variables initialisation is performed on function entry - again at runtime.Occam
@qrdl, depends on the compiler and target. For ROM-able code, values are sometimes set at compile time.Emory
@qrdl: Let me re-phrase that. Value initialization may allow (under certain situations) the compiler to do the initialization at compile time (rather than run-time). So POD only globals can be initialized at compile time.Berners
@qrdl: On many platforms, if "foo" is an Int32_t of static storage class, run-time statement "foo=0x12345678;" will generate code to store 0x12345678 in foo; that code would likely be at least ten bytes long, some microcontrollers would require as many as 32 bytes. A declaration "Int32_t foo=0x12345678;" would on many platforms cause the variable to be linked in an initialized-data segment and add 4 bytes to an initialization list. On some systems, "Int32_t foo;" would be four bytes more cheaper than "Int32_t foo=0;", the latter forcing foo to the initialized-data segment.Leavy
This is true in theory, but most compilers (and all the major mainstream ones like GCC and clang) know how to inline memset so the zeros are compile-time visible for constant-propagation into code that uses the value. As far as optimization is concerned, I wouldn't expect a difference here, or an actual run-time call to memset in libc. (Except for very large objects, then compilers may implement T foo = {}; by calling memset instead of inlining a SIMD loop.)Tackett
@PeterCordes I think the main point is that memset() is suspect. Especially if the type is not POD.Berners
Oh, yeah, that part's true. The part about compile-time vs. run-time is mostly a red herring. (And I was just thinking about PODs, as I had been looking for a duplicate for a C question about whether performance was equal. (Or "number of assembly statements" is how it was actually asked, but anyway eventually found Speed of memset Vs direct assignment to zero as an ideal duplicate.)Tackett
S
4

In some compilers STRUCT theStruct = {}; would translate to memset( &theStruct, 0, sizeof( STRUCT ) ); in the executable. Some C functions are already linked in to do runtime setup so the compiler have these library functions like memset/memcpy available to use.

Suwannee answered 4/1, 2010 at 11:57 Comment(1)
This actually bit me hard recently. I was working on a custom piece of compression code and was initializing some large structures at declaration time using struct something foo = { x, y, z } and cachegrind showed that 70% of my program's "work" was in memset because the structs were zeroed on EVERY function call.Kyungkyushu
A
-1

If there are lots of pointer members and you are likely to add more in the future, it can help to use memset. Combined with appropriate assert(struct->member) calls you can avoid random crashes from trying to deference a bad pointer that you forgot to initialize. But if you're not as forgetful as me, then member-initialization is probably the best!

However, if your struct is being used as part of a public API, you should get client code to use memset as a requirement. This helps with future proofing, because you can add new members and the client code will automatically NULL them out in the memset call, rather than leaving them in a (possibly dangerous) uninitialized state. This is what you do when working with socket structures for example.

Appendicitis answered 4/1, 2010 at 11:8 Comment(2)
How does it help in future-proofing? If you're assuming that the client code isn't recompiled, it'd end up calling memset with the wrong structure size. If the client code is recompiled, it'd need access to the updated header file with the struct definition for either memset or value initialization to work. (The client and the library do need to have a consistent notion of how the null pointer is represented, however, so if the API recommends memset, it ought to be checking against all-bits-zero, not against NULL.)Funk
Also, if the struct is part of a public API, then maybe one should consider an opaque structure with an initialization function instead.Funk

© 2022 - 2024 — McMap. All rights reserved.