Should C++ programmer avoid memset?
Asked Answered
S

11

44

I heard a saying that c++ programmers should avoid memset,

class ArrInit {
    //! int a[1024] = { 0 };
    int a[1024];
public:
    ArrInit() {  memset(a, 0, 1024 * sizeof(int)); }
};

so considering the code above,if you do not use memset,how could you make a[1..1024] filled with zero?Whats wrong with memset in C++?

thanks.

Sidwohl answered 29/12, 2009 at 17:52 Comment(6)
Can you give the reason as to why you think one should not do memset in C++? I don't know why doing memset should lead to any problem in C++. Please correct me if I am wrong. Thanks!Wadesworth
He probably heard it in the context of "don't use memset to zero-out class objects".Sun
@Jay: They above is OK. But using memset to zero the class object itself (not just a single member) is not a good idea. This is especially problomatic if the object contains members that have constructors (that do some initialization).Esperanzaespial
BTW it's a[0..1023], not a[1..1024].Outage
I would recommend against using a C style array, and instead use a vector. In which case you could then replace your constructor with ArrInit() : a( 1024, 0 ) {}, which would remove the memset and make your class arguably more "C++" in style.Honebein
Using memset directly will cause issues (as detailed below), however you could use an apporach like this: https://mcmap.net/q/245573/-how-to-do-the-equivalent-of-memset-this-without-clobbering-the-vtblFaunie
C
51

The issue is not so much using memset() on the built-in types, it is using them on class (aka non-POD) types. Doing so will almost always do the wrong thing and frequently do the fatal thing - it may, for example, trample over a virtual function table pointer.

Cologne answered 29/12, 2009 at 18:2 Comment(8)
Using memset on any class with a virtual function is likely to be bad.Cichocki
@Otto:because sizeof(class) would treat virtual function table pointer as one data member.Sidwohl
Or on any class that contains a non-pod type, such as a stringCologne
memset is also problematic when used on some POD types, like pointers and floating point types. Setting all the bytes to 0 will not portably set pointers to NULL or floating point types to 0.0.Urticaria
@toto: POD stands for "Plain Old Data". Essentially it refers to built-in types or structs or unions of built-in types. If you can declare it in C, it's probably a POD in C++.Urticaria
POD means "plain old data" types without (non-trivial) constructors or destructors.Cologne
C++ already has a generic replacement for memset: std::fill. So yes, a C++ programmer should avoid memset.Introject
so is virtual function pointer like an implicit member of a class?Veliger
D
52

In C++ std::fill or std::fill_n may be a better choice, because it is generic and therefore can operate on objects as well as PODs. However, memset operates on a raw sequence of bytes, and should therefore never be used to initialize non-PODs. Regardless, optimized implementations of std::fill may internally use specialization to call memset if the type is a POD.

Doggerel answered 29/12, 2009 at 18:14 Comment(1)
I forgot about std::fill so +1 to this from me. Yes, there is a c++ function specifically designed to fill containers so use it!Kellikellia
C
51

The issue is not so much using memset() on the built-in types, it is using them on class (aka non-POD) types. Doing so will almost always do the wrong thing and frequently do the fatal thing - it may, for example, trample over a virtual function table pointer.

Cologne answered 29/12, 2009 at 18:2 Comment(8)
Using memset on any class with a virtual function is likely to be bad.Cichocki
@Otto:because sizeof(class) would treat virtual function table pointer as one data member.Sidwohl
Or on any class that contains a non-pod type, such as a stringCologne
memset is also problematic when used on some POD types, like pointers and floating point types. Setting all the bytes to 0 will not portably set pointers to NULL or floating point types to 0.0.Urticaria
@toto: POD stands for "Plain Old Data". Essentially it refers to built-in types or structs or unions of built-in types. If you can declare it in C, it's probably a POD in C++.Urticaria
POD means "plain old data" types without (non-trivial) constructors or destructors.Cologne
C++ already has a generic replacement for memset: std::fill. So yes, a C++ programmer should avoid memset.Introject
so is virtual function pointer like an implicit member of a class?Veliger
S
24

Zero-initializing should look like this:

class ArrInit {
    int a[1024];
public:
    ArrInit(): a() { }
};

As to using memset, there are a couple of ways to make the usage more robust (as with all such functions): avoid hard-coding the array's size and type:

memset(a, 0, sizeof(a));

For extra compile-time checks it is also possible to make sure that a indeed is an array (so sizeof(a) would make sense):

template <class T, size_t N>
size_t array_bytes(const T (&)[N])  //accepts only real arrays
{
    return sizeof(T) * N;
}

ArrInit() { memset(a, 0, array_bytes(a)); }

But for non-character types, I'd imagine the only value you'd use it to fill with is 0, and zero-initialization should already be available in one way or another.

Skippet answered 29/12, 2009 at 17:55 Comment(4)
what if want to initialize the array with non-zero?Sidwohl
You can put any value you want inside the braces (e.g. ArrInit(): a() {5}) and it will initialize the array with that value.Pershing
You do realize that all I have to do is change int in your example to some class with a virtual function, and your code is likely to wipe out the vptr, don't you? You're explaining how to cause disasters in a slightly safer way.Cichocki
@Pace: No, you'll get a syntax error. Those braces are the ones delimiting the body of the constructor function. Even with actual array initialization syntax: "int a[1024] = { 5 };" only the elements you list will be initialized, so in this example, only a[0] will be 5, not the entire array.Odawa
R
14

What's wrong with memset in C++ is mostly the same thing that's wrong with memset in C. memset fills memory region with physical zero-bit pattern, while in reality in virtually 100% of cases you need to fill an array with logical zero-values of corresponding type. In C language, memset is only guaranteed to properly initialize memory for integer types (and its validity for all integer types, as opposed to just char types, is a relatively recent guarantee added to C language specification). It is not guaranteed to properly set to zero any floating point values, it is not guaranteed to produce proper null-pointers.

Of course, the above might be seen as excessively pedantic, since the additional standards and conventions active on the given platform might (and most certainly will) extend the applicability of memset, but I would still suggest following the Occam's razor principle here: don't rely on any other standards and conventions unless you really really have to. C++ language (as well a C) offers several language-level features that let you safely initialize your aggregate objects with proper zero values of proper type. Other answers already mentioned these features.

Rhinitis answered 29/12, 2009 at 18:34 Comment(2)
What is the difference between physical and logical zero?Brushwood
@Brushwood Physical zero is the explicit actual "all-zeros" bit pattern in memory. Logical zero is [potentially non-zero] bit pattern that is interpreted as zero value of some type by the language (C or C++ in our case).Rhinitis
K
8

It is "bad" because you are not implementing your intent.

Your intent is to set each value in the array to zero and what you have programmed is setting an area of raw memory to zero. Yes, the two things have the same effect but it's clearer to simply write code to zero each element.

Also, it's likely no more efficient.

class ArrInit
{
public:
    ArrInit();
private:
    int a[1024];
};

ArrInit::ArrInit()
{
    for(int i = 0; i < 1024; ++i) {
        a[i] = 0;
    }
}


int main()
{
    ArrInit a;
}

Compiling this with visual c++ 2008 32 bit with optimisations turned on compiles the loop to -

; Line 12
    xor eax, eax
    mov ecx, 1024               ; 00000400H
    mov edi, edx
    rep stosd

Which is pretty much exactly what the memset would likely compile to anyway. But if you use memset there is no scope for the compiler to perform further optimisations, whereas by writing your intent it's possible that the compiler could perform further optimisations, for example noticing that each element is later set to something else before it is used so the initialisation can be optimised out, which it likely couldn't do nearly as easily if you had used memset.

Kellikellia answered 29/12, 2009 at 18:6 Comment(2)
I understand of course that a default initializer will zero the array too, so this is just an example but the point stands, implement your requirements, which in this case is to set each array element to zero, rather than some other method to achieve the results unless it's the only way you can achieve other requirements such as performanceKellikellia
Which is pretty much exactly what the memset would likely compile to anyway. Nope, memset can be much more complicated and efficient than a simple rep stosdSuperscribe
U
1

In addition to badness when applied to classes, memset is also error prone. It's very easy to get the arguments out-of-order, or to forget the sizeof portion. The code will usually compile with these errors, and quietly do the wrong thing. The symptom of the bug might not manifest until much later, making it difficult to track down.

memset is also problematic with lots of plain types, like pointers and floating point. Some programmers set all bytes to 0, assuming the pointers will then be NULL and floats will be 0.0. That's not a portable assumption.

Urticaria answered 29/12, 2009 at 19:12 Comment(2)
Setting pointers and floating-point numbers to binary zero usually works, but I wouldn't want to get into the habit. Still, the IEEE floating-point standard gets more and more entrenched, and that interprets all-bits-zero as 0.0.Cichocki
@David: Yup, it usually works, but someday you'll be on a platform where it doesn't.Urticaria
R
1

This is an OLD thread, but here's an interesting twist:

class myclass
{
  virtual void somefunc();
};

myclass onemyclass;

memset(&onemyclass,0,sizeof(myclass));

works PERFECTLY well!

However,

myclass *myptr;

myptr=&onemyclass;

memset(myptr,0,sizeof(myclass));

indeed sets the virtuals (i.e somefunc() above) to NULL.

Given that memset is drastically faster than setting to 0 each and every member in a large class, I've been doing the first memset above for ages and never had a problem.

So the really interesting question is how come it works? I suppose that the compiler actually starts to set the zero's BEYOND the virtual table... any idea?

Remorseful answered 21/12, 2012 at 2:37 Comment(1)
"it doesn't crash or do anything obviously wrong that I could see" and "it works" are very much not the same thing. AFAICT both code snippets above are the same, but once you start invoking undefined behavior, all bets are off. Most likely a program that does either of the above will only (appear to) work under very specific circumstances, and will break badly in other circumstances (e.g. on a different compiler, or OS, or CPU architecture)Burin
N
1

As of C++ 11, the simplest way to fill an array with zeros is to zero-initialize it:

class ArrInit {
    int a[1024] = {}
};

To be precise, I believe this is actually aggregate initialization with an empty initializer list, which happens to zero-initialize every item in the array - but the main point is to show that it can be done. See https://en.cppreference.com/w/cpp/language/zero_initialization and https://en.cppreference.com/w/cpp/language/aggregate_initialization for more details.

You could also zero-initialize the container object itself, which might be more efficient if you have several members to zero out:

class ArrInit {
    int a[1024];
    int b[1024];
};

int main()
{
    ArrInit ai{};
}

See here for a live demo.

This said zero-initialization is a tricky beast - as the CPPReference page indicates, it has no dedicated syntax so you have to be careful that the compiler doesn't pick another type of initialization instead. And as the following question shows, getting compilers to zero-initialize your objects including padding can be tricky: Does C++ standard guarantee the initialization of padding bytes to zero for non-static aggregate objects?

With all this in mind, if you're writing high-performance code with a targeted set of platforms in mind (something common for instance in the video games industry), and you know the exact memory layout of the whole object you'll be overwriting with zeroes, memset can be a good tool if used carefully. It can enable things like reliable and fast comparison of POD objects with memcmp.

Newel answered 21/12, 2023 at 1:48 Comment(0)
O
0

Your code is fine. I thought the only time in C++ where memset is dangerous is when you do something along the lines of:
YourClass instance; memset(&instance, 0, sizeof(YourClass);.

I believe it might zero out internal data in your instance that the compiler created.

Oestrin answered 29/12, 2009 at 18:1 Comment(0)
U
0

There's no real reason to not use it except for the few cases people pointed out that no one would use anyway, but there's no real benefit to using it either unless you are filling memguards or something.

Unctuous answered 29/12, 2009 at 21:1 Comment(0)
H
0

The short answer would be to use an std::vector with an initial size of 1024.

std::vector< int > a( 1024 ); // Uses the types default constructor, "T()".

The initial value of all elements of "a" would be 0, as the std::vector(size) constructor (as well as vector::resize) copies the value of the default constructor for all elements. For built-in types (a.k.a. intrinsic types, or PODs), you are guaranteed the initial value to be 0:

int x = int(); // x == 0

This would allow the type that "a" uses to change with minimal fuss, even to that of a class.

Most functions that take a void pointer (void*) as a parameter, such as memset, are not type safe. Ignoring an object's type, in this way, removes all C++ style semantics objects tend to rely on, such as construction, destruction and copying. memset makes assumptions about a class, which violates abstraction (not knowing or caring what is inside a class). While this violation isn't always immediately obvious, especially with intrinsic types, it can potentially lead to hard to locate bugs, especially as the code base grows and changes hands. If the type that is memset is a class with a vtable (virtual functions) it will also overwrite that data.

Honebein answered 23/5, 2012 at 16:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.