Can std::memcmp read any bytes past the first difference?
Asked Answered
C

1

9

Consider:

constexpr char s1[] = "a";
constexpr char s2[] = "abc";
std::memcmp(s1, s2, 3);

If memcmp stops at the first difference it sees, it will not read past the second byte of s1 (the nul terminator), however I don't see anything in the C standard to confirm this behavior, and I don't know of anything in C++ which extends it.

n1570 7.24.4.1 PDF link

int memcmp(const void *s1, const void *s2, size_t n);

The memcmp function compares the first n characters of the object pointed to by s1 to the first n characters of the object pointed to by s2

Is my understanding correct that the standard describes the behavior as reading all n bytes of both arguments, but libraries can short circuit as-if they did?

Cumshaw answered 4/4, 2018 at 23:20 Comment(9)
C++ at least seems to say that the behavior is undefined if you access beyond either object, which means passing a size of 3 in your example leads to UB.Buine
@Buine cppreference is falible, the question is whether it can access beyond s1 in this example.Cumshaw
I could forsee an implementation that would use SIMD instructions that would be likely to compare past the first difference. I'd reckon the standard would avoid limiting against such optimizationsFaruq
Sure, it's not the standard, I'm just using it as one piece of evidence.Buine
Nothing says the function must do a character by character comparison. So no guaranteed short-circuiting, I would say.Fingering
This is why you want to use strncmp instead of memcmp.Fezzan
If you want "short-circuiting" behaviour, use strcmp(). There is nothing in the specification of memcmp() which requires the bytes to be compared sequentially, or that specifies circumstances in which any bytes in either sequence not be accessed. Since sizeof(s1) < 3, your memcpy() call has undefined behaviour.Rollmop
re: strcmp/strncmp. I can rewrite this example to be arrays of ints and it's the same question.Cumshaw
@vu1p3n0x that's what I was thinking.Cumshaw
S
7

The function is not guaranteed to short-circuit because the standard doesn't say it must.

Not only is it not guaranteed to short-circuit, but in practice many implementations will not. For example, glibc compares elements of type unsigned long int (except for the last few bytes), so it could read up to 7 bytes past the location which compared differently on a 64-bit implementation.

Some may think that this won't cause an access violation on the platforms glibc targets, because access to these unsigned long ints will always be aligned and therefore will not cross a page boundary. But when the two sources have a different alignment, glibc will read two consecutive unsigned long ints from one of the sources, which may be in different pages. If the different byte was in the first of those, an access violation can still be triggered before glibc performed the comparison (see function memcmp_not_common_alignment).

In short: Specifying a length that is larger than the real size of the buffer is undefined behavior even if the different byte occured before this length, and can cause crashes on common implementations.

Here's proof that it can crash: https://ideone.com/8jTREr

Slain answered 4/4, 2018 at 23:46 Comment(6)
The glibc example isn't evidence; it uses the same technique for strcmp which is guaranteed not to read past a null terminator by the standard. It relies on platform-specific knowledge that such a read won't cause problemsCameron
@Cameron glibc's strcmp implementation doesn't contain such an optimization.Slain
Maybe it's been changed ; I recall reading discussion about it in the pastCameron
@Cameron It looks like it will use the compiler intrinsic for strcmp if one is available, but I assume that intrinsic will be implemented safely by the compiler. I have edited to add an analysis of the code detailing how the memcmp implementation can cause a crash if the size is too short.Slain
Are you capable of constructing an example that crashes glibc's memcmp which wouldn't if it read one character at a time?Sunbeam
@Sunbeam Added an example.Slain

© 2022 - 2024 — McMap. All rights reserved.