Why does strcmp() return 0 when its inputs are equal?
Asked Answered
A

7

25

When I make a call to the C string compare function like this:

strcmp("time","time")

It returns 0, which implies that the strings are not equal.

Can anyone tell me why C implementations seem to do this? I would think it would return a non-zero value if equal. I am curious to the reasons I am seeing this behavior.

Ainslie answered 27/2, 2009 at 16:22 Comment(3)
Time != money directly - you have to employ a unit conversion first.Ectomy
@remeador: I can ask as many stupid joke questions as I wish. To those interested, the original question was: "Why does strcmp("time","money") return 0 instead of non-zero?".Ainslie
edit: return non-zero instead of 0.Ainslie
H
36

strcmp returns a lexical difference (or should i call it "short-circuit serial byte comparator" ? :-) ) of the two strings you have given as parameters. 0 means that both strings are equal

A positive value means that s1 would be after s2 in a dictionary.

A negative value means that s1 would be before s2 in a dictionary.

Hence your non-zero value when comparing "time" and "money" which are obviously different, even though one would say that time is money ! :-)

Homey answered 27/2, 2009 at 16:25 Comment(8)
strcmp() does not do a lexical comparison, it merely compares the value of each character until there is a difference or both strings terminate.Mitsue
I could swear I've read docs that said strcmp() does a lexical comparison, but some quick googling suggests Ferruccio is correct... could it have changed at some point?Militarism
Hmmm I'm pretty sure it's a lexical comparison ! cplusplus.com/reference/clibrary/cstring/strcmp.htmlMelano
I think it changes with the given definition of "lexical". I too have seen strcmp() referred to as being a 'lexical' comparator, though obviously it does not take different localizations, collations, etc. into account. Maybe it is better to call it a "short-circuit serial byte comparator."Prichard
It simply uses each character's ASCII value for comparison which are ordered like: ABCDE...abcde. A lexical comparison would order them like: AaBbCcDdEe.Mitsue
@Mitsue I updated my answer. But you'll admit that the idea behind the comparison is lexical ! :-)Melano
there is a strcoll. i think that one takes the current locale into accountCosenza
@litb, right it does, but for some strange reasons it falls back to strcmp if the locale is set to 'C' or 'POSIX' and not to strcasecmp, which would be more appropriate.Maness
E
19

The nice thing about an implementation like this is you can say

if(strcmp(<stringA>, <stringB>) > 0)   // Implies stringA > stringB
if(strcmp(<stringA>, <stringB>) == 0)  // Implies stringA == stringB
if(strcmp(<stringA>, <stringB>) < 0)   // Implies stringA < stringB
if(strcmp(<stringA>, <stringB>) >= 0)  // Implies stringA >= stringB
if(strcmp(<stringA>, <stringB>) <= 0)  // Implies stringA <= stringB
if(strcmp(<stringA>, <stringB>) != 0)  // Implies stringA != stringB

Note how the comparison with 0 exactly matches the comparison in the implication.

Eloiseloisa answered 27/2, 2009 at 17:9 Comment(1)
There's a wonderful/horrible macro from the comp.lang.c FAQ that implements almost exactly this string behavior. #define StrTest(str1, op, str2) (strcmp(str1, str2) op 0) With it, you would write if(StrTest(stringA, ==, stringB)) and friends. I'm on the fence as to whether it's a horrible idea or a wonderful idea.Iseult
C
7

It's common to functions to return zero for the common - or one-of-a-kind - case and non-zero for special cases. Take the main function, which conventionally returns zero on success and some nonzero value for failure. The precise non-zero value indicates what went wrong. For example: out of memory, no access rights or something else.

In your case, if the string is equal, then there is no reason why it is equal other than that the strings contain the same characters. But if they are non-equal then either the first can be smaller, or the second can be smaller. Having it return 1 for equal, 0 for smaller and 2 for greater would be somehow strange i think.

You can also think about it in terms of subtraction:

return = s1 - s2

If s1 is "lexicographically" less, then it will give is a negative value.

Cosenza answered 27/2, 2009 at 16:55 Comment(0)
I
4

Another reason strcmp() returns the codes it does is so that it can be used directly in the standard library function qsort(), allowing you to sort an array of strings:

#include <string.h> // for strcmp()
#include <stdlib.h> // for qsort()
#include <stdio.h>

int sort_func(const void *a, const void *b)
{
    const char **s1 = (const char **)a;
    const char **s2 = (const char **)b;
    return strcmp(*s1, *s2);
}

int main(int argc, char **argv)
{
    int i;
    printf("Pre-sort:\n");
    for(i = 1; i < argc; i++)
        printf("Argument %i is %s\n", i, argv[i]);
    qsort((void *)(argv + 1), argc - 1, sizeof(char *), sort_func);
    printf("Post-sort:\n");
    for(i = 1; i < argc; i++)
        printf("Argument %i is %s\n", i, argv[i]);
    return 0;
}

This little sample program sorts its arguments ASCIIbetically (what some would call lexically). Lookie:

$ gcc -o sort sort.c
$ ./sort hi there little fella
Pre-sort:
Argument 1 is hi
Argument 2 is there
Argument 3 is little
Argument 4 is fella
Post-sort:
Argument 1 is fella
Argument 2 is hi
Argument 3 is little
Argument 4 is there

If strcmp() returned 1 (true) for equal strings and 0 (false) for inequal ones, it would be impossible to use it to obtain the degree or direction of inequality (i.e. how different, and which one is bigger) between the two strings, thus making it impossible to use it as a sorting function.

I don't know how familiar you are with C. The above code uses some of C's most confusing concepts - pointer arithmetic, pointer recasting, and function pointers - so if you don't understand some of that code, don't worry, you'll get there in time. Until then, you'll have plenty of fun questions to ask on StackOverflow. ;)

Iseult answered 2/9, 2009 at 4:58 Comment(0)
S
3

You seem to want strcmp to work like a (hypothetical)

int isEqual(const char *, const char *)

To be sure that would be true to the "zero is false" interpretation of integer results, but it would complicate the logic of sorting because, having established that the two strings were not the same, you would still need to learn which came "earlier".

Moreover, I suspect that a common implementation looks like

int strcmp(const char *s1, const char *s2){
   const unsigned char *q1=s1, *q2=s2;
   while ((*q1 == *q2) && *q1){ 
      ++q1; ++q2; 
   };
   return (*q1 - *q2);
}

which is [edit: kinda] elegant in a K&R kind of way. The important point here (which is increasingly obscured by getting the code right (evidently I should have left well enough alone)) is the way the return statement:

   return (*q1 - *q2);

which gives the results of the comparison naturally in terms of the character values.

Seizure answered 27/2, 2009 at 16:53 Comment(10)
Wouldn't that "common implementation" walk off the buffer (past the null termination?) I don't see how it would ever return 0...Eloiseloisa
i think u gotta have to terminate the loop if you come to \0 :) u also have to watch about underflows in *s1 - *s2 so s1=-127, s2=2 => oops :)Cosenza
oh wait. no won't cause an underflow because both are first converted to an int.Cosenza
@dmckee: Why not just && *s1? :]Eloiseloisa
i've looked up in c99 it says both are first converted to unsigned char before being subtracted.so there can't even be a case where s1=-127 :) but it changes some: *a - *b could become -2 with char (*a=-1, *b=1) and +254 with unsigned char.so i believe a cast should be added there. kind of tricky :)Cosenza
@libt: The Mac OS X manpage agrees, but putting in the casts is going to play merry havoc with my allegidly "elegant" implementation. ::sigh::Seizure
indeed :) i like how you added "kinda". so one still can see the a - b pattern in it without scary casts in between :) all fine i would sayCosenza
I would think return (int)*q1 - (int)*q2; would solve the sign issue. What makes the casts scary?Synchronize
@joe_mucchiello: They would clutter up the one line I really wanted to emphasize in this instance...Seizure
joe_micchiello, they would not solve it. they're cast (promoted) to int anyway automatically. you will have to do (unsigned char)*q1 - (unsigned char)*q2; i think that really clutters the line. now anyway that they could just look into the comments and see how they could write it in non-teaching wayCosenza
I
2

There's three possible results: string 1 comes before string 2, string 1 comes after string 2, string 1 is the same as string 2. It is important to keep these three results separate; one use of strcmp() is to sort strings. The question is how you want to assign values to these three outcomes, and how to keep things more or less consistent. You might also look at the parameters for qsort() and bsearch(), which require compare functions much like strcmp().

If you wanted a string equality function, it would return nonzero for equal strings and zero for non-equal strings, to go along with C's rules on true and false. This means that there would be no way of distinguishing whether string 1 came before or after string 2. There are multiple true values for an int, or any other C data type you care to name, but only one false.

Therefore, having a useful strcmp() that returned true for string equality would require a lot of changes to the rest of the language, which simply aren't going to happen.

Impertinent answered 27/2, 2009 at 17:25 Comment(0)
F
0

I guess it is simply for symmetry: -1 if less, 0 if equal, 1 if more.

Fortuitism answered 27/2, 2009 at 16:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.