Why use bzero over memset?
Asked Answered
K

9

188

In a Systems Programming class I took this previous semester, we had to implement a basic client/server in C. When initializing the structs, like sock_addr_in, or char buffers (that we used to send data back and forth between client and server) the professor instructed us to only use bzero and not memset to initialize them. He never explained why, and I'm curious if there is a valid reason for this?

I see here: http://fdiv.net/2009/01/14/memset-vs-bzero-ultimate-showdown that bzero is more efficient due to the fact that is only ever going to be zeroing memory, so it doesn't have to do any additional checking that memset may do. That still doesn't necessarily seem like a reason to absolutely not use memset for zeroing memory though.

bzero is considered deprecated, and furthermore is a not a standard C function. According to the manual, memset is preferred over bzero for this reason. So why would you want to still use bzero over memset? Just for the efficiency gains, or is it something more? Likewise, what are the benefits of memset over bzero that make it the de facto preferred option for newer programs?

Keyboard answered 13/6, 2013 at 21:0 Comment(11)
seems like memset is more portableAgone
"Why use bzero over memset?" - Don't. Memset is standard, bzero isn't.Satiated
my question is - why not use calloc in this case?Wolfgram
bzero is a BSDism(). memset() is ansi-c. nowadays, bzero() will probably be implemented as a macro. Do ask your professor to shave himself and read some books. efficiency is a bogus-argument. A syscall or context-switch can easily cost tens of thousands clock ticks, one pass over a buffer runs at bus speed. If you want to optimise network-programs: minimise the number of syscalls (by reading/writing larger chunks)Halfpenny
The idea that memset may be slightly less efficient because of "a bit more checking going" is definitely a case of premature optimization: whatever the gains that you might see from omitting a CPU instruction or two are not worth it when you can jeopardize portability of your code. bzero is obsolete, and that's enough reason not to use it.Witling
Related: bzero() & bcopy() versus memset() & memcpy()Tooley
Often, you can add an initializer ` = {0}` instead, and not call a function at all. This became easier when around the turn of the century C stopped requiring up-front declaration of local variables. Some truly old paperware is still stuck deep in the previous century, though.Titter
Did your professor ever end up giving you a reason?Pinetum
@S.S.Anne no, but it most likely originated from a recommended book for the course he was influenced by, as mentioned in one of the answers below: https://mcmap.net/q/135095/-why-use-bzero-over-memsetKeyboard
explicit_bzero or just bzero? This distinction matters a lot as far as best practices and general rules about whether you should use it instead of memset go.Excitability
Maybe you should look at what courses you take, if you are unable to ask questions to your teacher, to aid understanding. ;)Pentecostal
O
174

I don't see any reason to prefer bzero over memset.

memset is a standard C function while bzero has never been a C standard function. The rationale is probably because you can achieve exactly the same functionality using memset function.

Now regarding efficiency, compilers like gcc use builtin implementations for memset which switch to a particular implementation when a constant 0 is detected. Same for glibc when builtins are disabled.

Outcrop answered 13/6, 2013 at 21:6 Comment(3)
Thanks. This makes sense. I was pretty sure that memset should always be used in this case, but was confused as to why we weren't using it. Thanks for clarifying, and reaffirming my thoughts.Keyboard
I've had many problems with broken bzero implementations. On non-aligned arrays it used to overshoot the provided length and zero out a little more bytes. Never had such an issue after switching to memset.Dwight
Don't forget about memset_s which should be used if you want to ensure the compiler doesn't quietly optimize-away a call to "scrub" memory for some security-related purpose (such as blanking-out a region of memory that contained a sensitive piece of information such as a cleartext password).Vaporous
C
73

I'm guessing you used (or your teacher was influenced by) UNIX Network Programming by W. Richard Stevens. He uses bzero frequently instead of memset, even in the most up-to-date edition. The book is so popular, I think it's become an idiom in network programming which is why you still see it used.

I would stick with memset simply because bzero is deprecated and reduces portability. I doubt you would see any real gains from using one over the other.

Cosher answered 13/6, 2013 at 21:5 Comment(5)
You would be correct. We didn't have required textbooks for this course, but I just checked the syllabus again and UNIX Network Programming is indeed listed as an optional resource. Thanks.Keyboard
It's actually worse than that. It was deprecated in POSIX.1-2001 and removed in POSIX.1-2008.Lacreshalacrimal
Quoting page 8 of the third edition of UNIX Network Programming by W. Richard Stevens - Indeed, the author of TCPv3 made the mistake of swapping the second and third arguments to memset in 10 occurrences of the first printing. A C compiler cannot catch this error because both occurrences are the same... it was an error, and could be avoided using bzero, because swapping the two arguments to bzero will always be caught by the C compiler if function prototypes are used. However as paxdiablo pointed out, bzero is deprecated.Ukase
@AaronNewton, you should add that to Michael’s answer since it confirms what he said.Ishtar
Re: reduces portability: it is profoundly trivial to polyfill, and what you get in exchange is reducing the needless potential for human error, which demands a chronic mental vigilance cost which could be better spent on noticing other logic problems.Excitability
B
63

The one advantage that I think bzero() has over memset() for setting memory to zero is that there's a reduced chance of a mistake being made.

More than once I've come across a bug that looked like:

memset(someobject, size_of_object, 0);    // clear object

The compiler won't complain (though maybe cranking up some warning levels might on some compilers) and the effect will be that the memory isn't cleared. Because this doesn't trash the object - it just leaves it alone - there's a decent chance that the bug might not manifest into anything obvious.

The fact that bzero() isn't standard is a minor irritant. (FWIW, I wouldn't be surprised if most function calls in my programs are non-standard; in fact writing such functions is kind of my job).

In a comment to another answer here, Aaron Newton cited the following from Unix Network Programming, Volume 1, 3rd Edition by Stevens, et al., Section 1.2 (emphasis added):

bzero is not an ANSI C function. It is derived from early Berkely networking code. Nevertheless, we use it throughout the text, instead of the ANSI C memset function, because bzero is easier to remember (with only two arguments) than memset (with three arguments). Almost every vendor that supports the sockets API also provides bzero, and if not, we provide a macro definition in our unp.h header.

Indeed, the author of TCPv3 [TCP/IP Illustrated, Volume 3 - Stevens 1996] made the mistake of swapping the second and third arguments to memset in 10 occurrences in the first printing. A C compiler cannot catch this error because both arguments are of the same type. (Actually, the second argument is an int and the third argument is size_t, which is typically an unsigned int, but the values specified, 0 and 16, respectively, are still acceptable for the other type of argument.) The call to memset still worked, because only a few of the socket functions actually require that the final 8 bytes of an Internet socket address structure be set to 0. Nevertheless, it was an error, and one that could be avoided by using bzero, because swapping the two arguments to bzero will always be caught by the C compiler if function prototypes are used.

I also believe that the vast majority of calls to memset() are to zero memory, so why not use an API that is tailored to that use case?

A possible drawback to bzero() is that compilers might be more likely to optimize memcpy() because it's standard and so they might be written to recognize it. However, keep in mind that correct code is still better than incorrect code that's been optimized. In most cases, using bzero() will not cause a noticeable impact on your program's performance, and that bzero() can be a macro or inline function that expands to memcpy().

Bloodstain answered 13/6, 2013 at 22:6 Comment(11)
Yes, I suppose this might be a reasoning when working in a classroom setting like this was, so as to make it potentially less confusing for the students. I don't think this was the case with my professor, however. He was a very big RTFM teacher. If you had a question that could be answered by the manual, he would pull up the man pages on the projector in class and show you. He was very much about ingraining into everyone's minds that the manual is there to be read and answers most of your questions. I'm thankful for this, as opposed to some other professors.Keyboard
I think that this is an argument that can be made even outside the classroom - I've seen this bug in production code. It strikes me as an easy mistake to make. I'd also guess that the vast majority of memset() calls are simply to zero out a block of memory, which I think is another argument for bzero(). What does the 'b' in bzero() stand for anyway?Bloodstain
+1. That memset violates a common parameter ordering of "buffer, buffer_size" makes it particularly error-prone IMO.Gisele
In Pascal they avoid that by calling it "fillchar" and it takes a char. Most C/C++ compilers would pick that one up. Which makes me wonder why compilers don't say "you are passing a 32/64 bit pointer where a byte is expected" and kick you firmly in the compiler errors.Daughtry
I guess the b in bzero comes from “buffer”, @MichaelBurr. But it can also be “binary” or “byte”.Tooley
It seems that this is indeed the correct answer to the question.Ishtar
@MichaelBurr what is the bug in the memset-example you supplied..? could you explain it?Inhabiter
@Inhabiter second and third argument are in wrong order; the quoted function call does exactly nothingIdou
@Keyboard There is a huge difference between spoon-feeding answers (instead of encouraging looking at the manual) and guiding programmers to a better interface that is less error-prone because it requires less vigilant remembering of arbitrary detail. Those only look superficially similar to an untrained eye, because they both approximately resemble a value of personal responsibility for checking the documentation.Excitability
@Keyboard I both expect people to RTMF, thoroughly and carefully at that, but I also recognize that every interface irregularity and relevant possibility that a brain has to keep track of is a cost. In the case of memset case, the cognition to pass value-then-size instead of size-then-value when calling memset needs to be regularly used or rehearsed, which includes the cognition of always checking "wait am I using one of the exceptional functions?" before slipping into the more generally used and reinforced cognition flow of passing buffer-size-right-after-buffer-pointer.Excitability
@MichaelBurr The b in bzero most likely stands for byte, which would mean "byte zero" -> "zero out a region in byte-sized indivisible increments"Kuban
N
5

Have it any way you like. :-)

#ifndef bzero
#define bzero(d,n) memset((d),0,(n))
#endif

Note that:

  1. The original bzero returns nothing, memset returns void pointer (d). This can be fixed by adding the typecast to void in the definition.
  2. #ifndef bzero does not prevent you from hiding the original function even if it exists. It tests the existence of a macro. This may cause lots of confusion.
  3. It’s impossible to create a function pointer to a macro. When using bzero via function pointers, this will not work.
Nephralgia answered 20/3, 2014 at 20:15 Comment(5)
What’s the problem with this, @Leeor? General antipathy for macros? Or you dislike the fact that this macro can be confused with the function (and possibly even hides it)?Tooley
@Palec, the latter. Hiding a redefinition as a macro can lead to so much confusion. Another programmer using this code thinks he's using one thing, and is unknowingly forced to use the other. That's a time bomb.Marysa
After giving it another thought, I agree that this is indeed a bad solution. Among other things I found a technical reason: When using bzero via function pointers, this will not work.Tooley
You really should have called your macro something other than bzero. This is an atrocity.Horick
Flaws aside, this answer is useful. At this moment, it's the first answer in the lot which proactively points out how trivially bzero can be defined - the macro given by the answer is an imperfect source-level polyfill, but a polyfill nonetheless, and you could follow this polyfilling approach to its logical improvement of defining a function named bzero on platforms that don't have it. (Similarly, you could decide to implement a function or macro with a different name, which also gets around bzero unportability, and this is the topmost answer to even come close to gesturing at this.)Excitability
L
4

You probably shouldn't use bzero, it's not actually standard C, it was a POSIX thing.

And note that word "was" - it was deprecated in POSIX.1-2001 and removed in POSIX.1-2008 in deference to memset so you're better off using the standard C function.

Lacreshalacrimal answered 20/8, 2013 at 9:28 Comment(5)
What do you mean by standard C? You mean it is not found in standard C library?Whitby
@Koray, standard C means the ISO standard and, yes, bzero is not part of that.Lacreshalacrimal
No I mean, I do not know what you mean by any standard. Does ISO standard mean the standard C library? That comes with the language? The minimal library that we know it will be there?Whitby
@Koray, ISO is the standards organisation which is responsible for the C standard, the current one being C11, and earlier ones C99 and C89. They lay down the rules that an implementation must follow in order to be considered C. So yes, if the standard says an implementation must provide memset, it will be there for you. Otherwise, it's not C.Lacreshalacrimal
IMO, POSIX did a mistake removing bzero(3); it has a much better interface than memset(3), as @MichaelBurr already noted in his answer. However, not being in any standard shouldn't be important to user code. One can use bzero(3) (and I'd encourage that), and if an implementation lacks it, just inline void bzero(void *s, size_t n) { memset(s, 0, n); } would be enough.Zinc
S
4

For memset function, the second argument is an int and the third argument is size_t,

void *memset(void *s, int c, size_t n);

which is typically an unsigned int, but if the values like, 0 and 16 for second and third argument respectively are entered in wrong order as 16 and 0 then, such a call to memset can still work, but will do nothing. Because the number of bytes to initialize are specified as 0.

void bzero(void *s, size_t n)

Such an error can be avoided by using bzero, because swapping the two arguments to bzero will always be caught by the C compiler if function prototypes are used.

Stefaniastefanie answered 23/12, 2013 at 6:8 Comment(2)
Such an error can also be avoided with memset if you simply think of the call as "set this memory to this value for this size", or if you have an IDE that gives you the prototype or even if you just know what you're doing :-)Lacreshalacrimal
Agree, but this function was created at the time when such intelligent IDEs were not available for the support.Stefaniastefanie
T
4

Wanted to mention something about bzero vs. memset argument. Install ltrace and then compare what it does under the hood. On Linux with libc6 (2.19-0ubuntu6.6), the calls made are exactly the same (via ltrace ./test123):

long m[] = {0}; // generates a call to memset(0x7fffefa28238, '\0', 8)
int* p;
bzero(&p, 4);   // generates a call to memset(0x7fffefa28230, '\0', 4)

I've been told that unless I am working in the deep bowels of libc or any number of kernel/syscall interface, I don't have to worry about them. All I should worry about is that the call satisfy the requirement of zero'ing the buffer. Others have mentioned about which one is preferable over the other so I'll stop here.

Tillich answered 22/4, 2015 at 14:20 Comment(6)
This happens because some versions of GCC will emit code for memset(ptr, 0, n) when they see bzero(ptr, n) and they can't convert it to inline code.Shallot
@Shallot It's actually a macro.Pinetum
@S.S.Anne gcc 9.3 on my computer does this transformation itself, without any help from macros in the system headers. extern void bzero(void *, size_t); void clear(void *p, size_t n) { bzero(p, n); } produces a call to memset. (Include stddef.h for size_t without anything else that could interfere.)Shallot
@Shallot And you've verified in that test that including stddef.h does not end up causing a bzero macro to be defined?Excitability
@Excitability Yes. Also, I can reproduce the effect with no headers at all, using gcc's builtin __SIZE_TYPE__. Also, stddef.h defining a macro named bzero would be a dire conformance violation.Shallot
@Shallot Cheers, thanks for clarifying.Excitability
L
2

In short: memset require more assembly operations then bzero.

This is the source: http://fdiv.net/2009/01/14/memset-vs-bzero-ultimate-showdown

Lashawn answered 16/1, 2014 at 8:45 Comment(3)
Yes, that is one thing that I mentioned in the OP. I actually even linked to that exact page. It turns out that doesn't seem to really make much difference due to some compiler optimizations. For more details see the accepted answer by ouah.Keyboard
This only shows that one rubbish implementation of memset is slow. On MacOS X and some other systems, memset uses code that is set up at boot time depending on the processor that you are using, makes full use of vector registers, and for large sizes it uses prefetch instructions in clever ways to get the last bit of speed.Fumatorium
fewer instructions doesn't mean faster execution. In fact optimizations often increase the binary size and number of instructions due to loop unrolling, function inlining, loop alignment... Look at any decent optimized code and you'll see it often has much more instructions than shitty implementationsSharleensharlene
P
0

memset takes 3 parameters, bzero takes 2 in memory constrained that extra parameter would take 4 more bytes and most of the time itll be used to set everything to 0

Pecoraro answered 23/7, 2019 at 17:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.