What's wrong with strcmp?
Asked Answered
J

3

18

In the responses to the question Reading In A String and comparing it C, more than one person discouraged the use of strcmp(), saying things like

I also strongly, strongly advise you to get used to using strncmp() now, ... to avoid many problems down the road.

or (in Why does my string comparison fail? )

Make certain you use strncmp and not strcmp. strcmp is profoundly unsafe.

What problems are they alluding to?

The reason scanf() with string specifiers and gets() are strongly discouraged is because they almost inevitably lead to buffer overflow vulnerabilities. However, it's not possible to overflow a buffer with strcmp(), right?

"A buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory."

( -- Wikipedia: buffer overflow).

Since the strcmp() function never writes to any buffer, the strcmp() function cannot cause a buffer overflow, right?

What is the reason people discourage the use of strcmp(), and recommend strncmp() instead?

Juanitajuanne answered 22/6, 2014 at 16:54 Comment(4)
Why don't you ask those people?Pavel
Trust me. strncmp() is no better (or worse) than strcmp().Rozier
The comments on the linked SO article are mostly junkDominiquedominium
Using strncmp to protect against strings which are not null terminated just papers over the root problem which is that you have a non-terminated string. It will just mess up on the next function that assumes it's null-terminated.Treachery
I
26

While strncmp can prevent you from overrunning a buffer, its primary purpose isn't for safety. Rather, it exists for the case where one wants to compare only the first N characters of a (properly possibly NUL-terminated) string.

From the man page:

The strcmp() function compares the two strings s1 and s2. It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2.

The strncmp() function is similar, except it compares the only first (at most) n bytes of s1 and s2.

Note that strncmp in this case cannot be replaced with a simple memcmp, because you still need to take advantage of its stop-on-NUL behavior, in case one of the strings is shorter than n.

If strcmp causes a buffer overrun, then one of two things is true:

  1. Your data isn't expected to be NUL-terminated, and you should be using memcmp instead.
  2. Your data is expected to be NUL-terminated, but you've already screwed up when you populated the buffer, by somehow not NUL-terminating it.

Note that reading past the end of a buffer is still considered a buffer overrun. While it may seem harmless, it can be just as dangerous as writing past the end.

Reading, writing, executing... it doesn't matter. Any memory reference to an unintended address is undefined behavior. In the most apparent scenario, you attempt to access a page that isn't mapped into your process's address space, causing a page fault, and subsequent SIGSEGV. In the worst case, you sometimes run into a \0 byte, but other times you run into some other buffer, causing inconstant program behavior.

Introit answered 22/6, 2014 at 17:40 Comment(8)
I don't see how strcmp() can cause a buffer overrun, even if both those things are true. Could you say a few more words about what exactly goes wrong?Juanitajuanne
Imagine you have char buf[100] in which every character is 'a' (it is not NUL-terminated). If you pass this buffer to strcmp (assuming the other parameter is a longer string, then strcmp will continue comparing at buf[100] and so on, overrunning the buffer.Introit
I can see how writing past the end of a buffer causes problems. But strcmp() doesn't do that, right? Would you mind adding a few words to your answer about what exactly goes wrong if strcmp() goes on reading past the end of a buffer?Juanitajuanne
Reading, writing, executing... it doesn't matter. Any memory reference to an unintended address is undefined behavior. In the most apparent scenario, you attempt to access a page that isn't mapped into your process's address space, causing a page fault, and subsequent SIGSEGV. In the worst case, you sometimes run into a \0 byte, but othertimes you run into some other buffer, causing inconstant program behavior.Introit
Good point. Please hit the above "edit" button and add it to your answer. I've done a lot of programming on machines that never page fault, so that particular "page fault" problem never happens on those machines, but this is exactly the sort of thing I want to know about so my C is portable to machines where this sort of thing can and does happen.Juanitajuanne
If you want to ensure portability, you should ensure correctness. No matter what the machine, reading past the end of a buffer will invoke undefined behavior. The CPU may happily read zeros, or the memory controller might catch fire.Introit
"it exists for the case where one wants to compare only the first N characters of a (properly NUL-terminated) string." is not correct. From the C spec "The strncmp function returns an integer ... accordingly as the possibly null-terminated array pointed to by s1". Neither s1 nor s2 of int strncmp(const char *s1, const char *s2, size_t n); need to be C strings. Independently, they can simply be strings or they can be arrays of char without being "properly NUL-terminated".Omphale
@chux Hmm, it seems I will have to re-visit my answer. Thanks for that.Introit
R
8

A string is by definition "a contiguous sequence of characters terminated by and including the first null character".

The only case where strncmp() would be safer than strcmp() is when you're comparing two character arrays as strings, you're certain that both arrays are at least n bytes long (the 3rd argument passed to strncmp()), and you're not certain that both arrays contain strings (i.e., contain a '\0' null character terminator).

In most cases, your code (if it's correct) will guarantee that any arrays that are supposed to contain null-terminated strings actually do contain null-terminated strings.

That added n in strncmp() is not a magic wand that makes unsafe code safe. It doesn't guard against null pointers, uninitialized pointers, uninitialized arrays, an incorrect value of n, or just passing incorrect data. You can shoot yourself in the foot with either function.

And if you're trying to call strcmp or strncmp with an array that you thought contained a null-terminated string but actually doesn't, then your code already has a bug. Using strncmp() might help you avoid the immediate symptom of that bug, but it won't fix it.

Rhebarhee answered 23/6, 2014 at 6:1 Comment(0)
P
2

strcmp compares two strings character to character until a difference has been detected or the \0 is found at one of them.

On the other hand, strncmp provides a way to limit the number of characters to be compared so if the strings do not end with \0 the function won't continue checking after the size limit has been reached.

Imagine what would happen if you are comparing two strings at this two memory regions:

0x40, 0x41, 0x42,... 0x40, 0x41, 0x42,...

And you are only interested in the two first characters. Somehow \0 has been removed from the end of the strings and the third byte happens to coincide at the two regions. strncmp would avoid comparing this third byte if num parameter is 2.

EDIT As the comments below indicate, this situation is derived from a wrong or very concrete use of the language.

Paediatrics answered 22/6, 2014 at 16:59 Comment(6)
If you want to compare memory regions, use memcmp. In C, a "string" is a null-terminated character sequence. If you have strings, use strcmp. If you don't, don't.Pavel
Sure, strcmp would then make the comparison "safe", but that's really just deferring the problem. Your non-null-terminated string would then cause undefined behaviour later on in your program.Lumumba
@OliCharlesworth I do agree, just wanted to pinpoint a case where strncmp is safer to use than strcmpIsopropanol
As I understand it, strncmp doesn't exist for "safety", but rather "I want to compare the first N characters of these strings".Introit
@JonathonReinhart: You should make that an answer.Lumumba
Things that don't end in \0 are not stringsDominiquedominium

© 2022 - 2024 — McMap. All rights reserved.