Implementation of strcmp
Asked Answered
C

8

12

I tried to implement strcmp:

int strCmp(char string1[], char string2[])
{
    int i = 0, flag = 0;    
    while (flag == 0) {
        if (string1[i] > string2[i]) {
            flag = 1;
        } else
        if (string1[i] < string2[i]) {
            flag = -1;
        } else {
            i++;
        }
    }
    return flag;
}

but I'm stuck with the case that the user will input the same strings, because the function works with 1 and -1, but it's doesn't return 0. Can anyone help? And please without pointers!

Constabulary answered 19/1, 2016 at 9:38 Comment(1)
Possible duplicate of Optimized strcmp implementationTouching
P
6

You seem to want to avoid pointer arithmetics which is a pity since that makes the solution shorter, but your problem is just that you scan beyond the end of the strings. Adding an explicit break will work. Your program slightly modified:

int strCmp(char string1[], char string2[] )
{
    int i = 0;
    int flag = 0;    
    while (flag == 0)
    {
        if (string1[i] > string2[i])
        {
            flag = 1;
        }
        else if (string1[i] < string2[i])
        {
            flag = -1;
        }

        if (string1[i] == '\0')
        {
            break;
        }

        i++;
    }
    return flag;
}

A shorter version:

int strCmp(char string1[], char string2[] )
{
    for (int i = 0; ; i++)
    {
        if (string1[i] != string2[i])
        {
            return string1[i] < string2[i] ? -1 : 1;
        }

        if (string1[i] == '\0')
        {
            return 0;
        }
    }
}
Petrel answered 19/1, 2016 at 10:15 Comment(0)
E
42

Uhm.. way too complicated. Go for this one:

int strCmp(const char* s1, const char* s2)
{
    while(*s1 && (*s1 == *s2))
    {
        s1++;
        s2++;
    }
    return *(const unsigned char*)s1 - *(const unsigned char*)s2;
}

It returns <0, 0 or >0 as expected

You can't do it without pointers. In C, indexing an array is using pointers.

Maybe you want to avoid using the * operator? :-)

Electrochemistry answered 19/1, 2016 at 9:47 Comment(0)
A
10

First of all standard C function strcmp compares elements of strings as having type unsigned char.

Secondly the parameters should be pointers to constant strings to provide the comparison also for constant strings.

The function can be written the following way

int strCmp( const char *s1, const char *s2 )
{
    const unsigned char *p1 = ( const unsigned char * )s1;
    const unsigned char *p2 = ( const unsigned char * )s2;

    while ( *p1 && *p1 == *p2 ) ++p1, ++p2;

    return ( *p1 > *p2 ) - ( *p2  > *p1 );
}
Adamite answered 19/1, 2016 at 10:2 Comment(21)
I would personally prefer while ( *p1 && *p1 == *p2 ) {++p1, ++p2}, but otherwise, nice answer.Duclos
@Duclos The loop could be writtten like for ( const unsigned char *p1 = ( const unsigned char * )s1, *p2 = ( const unsigned char * )s2; *p1 && *p1 == *p2; ++p1, ++p2 ); As you see the loop has no body (statement). And it has another problem: pointers p1 and p2 are needed outside the loop.;Adamite
@Duclos So let's change it const unsigned char *p1 = ( const unsigned char * )s1, *p2 = ( const unsigned char * )s2; for (; *p1 && *p1 == *p2; ++p1, ++p2 ); But again this loop has no body. This is confusing.Adamite
@Duclos To make the body of the loop it is enough to place EXPESSSION as EXPRESSION STATEMENT outside the loop. In this case it will look like char *p1 = ( const unsigned char * )s1, *p2 = ( const unsigned char * )s2; while (; *p1 && *p1 == *p2 ) ++p1, ++p2 ;; So we have the same original loop but the init part of the loop and the expression part of the loop are placed outside the loop like one statement.Adamite
I guess we have a problem of definitions here: As I see it, the body of a loop, or at least a while loop, are the statements that are executed with every loop iteration, but aren't the condition statement. So my suggestion is merely to transform while(COND) BODY; to while(COND) {BODY}Duclos
@Duclos Look from other side. for( ; condition; expression ) /*empty statement */; ==> while ( condition ) expression statement;Adamite
I disagree that this comparison between for and while is getting the point: The for construct is just syntactical sugar for PRECOND; while(COND) {BODY1,BODY2} ==> for(PRECOND,COND,BODY2) {BODY1} It's also more precise regarding the scopes.Duclos
@Duclos The for statement has no body. I showed this. Its body is empty. It has only an expression in its third part. So the for statement is written as a single statement without any other statements and moreover without a complicated compound statement. I rewrote it with an equivalent while statement that differs only in that the expression is rewritten as an expression statement. But it looks like one statement without eny complictaed compound statement.Adamite
I'm sorry but I don't think that your for to while conversion is correct neither relevant. The C++ standard, N4140 § 6.5.3 says " for ( for-init-statement condition(opt); expression(opt)) statement is equivalent to { for-init-statement while ( condition ) { statement expression ; } }" except for the declarative region of for-init-statement and the behavior of continue. And because 'expression = ++p1, ++p2 != empty_statement' and 'statement expression = loop_body' -> 'loop_body != empty_statement'Duclos
@Duclos I shwoed how to write a relevant for statement. It has one drawback that is it has an empty statement. becuase any statement is redundant in this case. I showed how to rewrite this for statement using while statement such a way that the statement would not be empty. It is enough to place the expression outside the for statement making an expression statement.Adamite
@Duclos But the result is the same there is only in fact a single statement in a single line. There is no any need to complicate the for statement introducing a compound statement. It is a bad style of programming to make simple things more complicated.Adamite
The nature of a style is that it's subjective and while omitting this 2 braces makes the code shorter, it makes it less readable. To be fair it would add also 3-5 newlines to the 2 braces, and replace the ',' with ';'. And for example the whole type system in C++ makes a simple thing more complicated. You could omit any type information with the assumption that the programmer will always know what the type is. But since such assumptions proved to be a bit, say "optimistic"...Duclos
@Duclos I do not see any sense to substitute one expression statement for a compound statement consisting of several statements. When there is one expression statement the code is very clear: It is very easy observed./ When there are several statements you should investigate each of them to be sure that you understand what the code does. This requires more time to investigate the code. The more time is reuqired the worse style is used.Adamite
The time to unterstand a piece of code is not solely based on it's character length nor statement count. And the time you need to understand a piece of code, is not necessary the time, someone else needs, especially, when you wrote the code. Because this is obviously subjective, I just made this suggestion as a supplementary comment, not as an edit request. So the reader can decide which style he can relate better.Duclos
@Duclos I do not see any difficulties to understand increasing of two pointers that are observer and comp[ared before the increment.:) Do not try to find a black cat in a dark room if especially there is no cat.:)Adamite
I do not too, and that's the reason why I upvoted your answer yesterday. But I know people who could. And yes, they are CS majors.Duclos
@Duclos What is CS? Is it computer siences? Usually CS majors are very weak programmers.:)Adamite
Let us continue this discussion in chat.Duclos
@Duclos I am sorry. I do not think we will be able to say something new each other.:)Adamite
@VladfromMoscow Why not simple subtraction like the top answer? How would the overflow happen?Deterge
@Deterge It is a general approach. For example if the comparison function compares two integers of the type unsigned int then their difference never can be negative.:)Adamite
P
6

You seem to want to avoid pointer arithmetics which is a pity since that makes the solution shorter, but your problem is just that you scan beyond the end of the strings. Adding an explicit break will work. Your program slightly modified:

int strCmp(char string1[], char string2[] )
{
    int i = 0;
    int flag = 0;    
    while (flag == 0)
    {
        if (string1[i] > string2[i])
        {
            flag = 1;
        }
        else if (string1[i] < string2[i])
        {
            flag = -1;
        }

        if (string1[i] == '\0')
        {
            break;
        }

        i++;
    }
    return flag;
}

A shorter version:

int strCmp(char string1[], char string2[] )
{
    for (int i = 0; ; i++)
    {
        if (string1[i] != string2[i])
        {
            return string1[i] < string2[i] ? -1 : 1;
        }

        if (string1[i] == '\0')
        {
            return 0;
        }
    }
}
Petrel answered 19/1, 2016 at 10:15 Comment(0)
D
2

This is a 10 opcodes implementation of strcmp (GCC assumed)

int strcmp_refactored(const char *s1, const char *s2)
{
    while (1)
    {
        int res = ((*s1 == 0) || (*s1 != *s2));
        if  (__builtin_expect((res),0))
        {
            break;
        }
        ++s1;
        ++s2;
    }
    return (*s1 - *s2);
}

You can try this implementation and compare to others https://godbolt.org/g/ZbMmYM

Desdee answered 26/8, 2017 at 8:53 Comment(0)
T
0

Your problem is that you don't detect the end of the string and therefore don't return zero if both strings ends before any difference is detected.

You can simply fix this by checking for this in the loop condition:

while( flag==0 && (string1[i] != 0 | string2[i] != 0 ) )

Note that both strings are checked because if only one is at the end the strings are not equal and the comparison inside the loop should detect that.

Please note that the character comparison might not yield the result that you might expect. For one it's not defined whether char is signed or unsigned so you should probably cast to unsigned char for comparison.

Perhaps a cleaner solution would be to return immediately when you detect the difference, that is instead of flag = -1 you return -1 directly. But that's more a matter of opinion.

Tithable answered 19/1, 2016 at 9:55 Comment(6)
The && is superfluous, you know string1[i] == string2[i] at this stage.Euchologion
if strin1 is zero and string 2 is not zero you continue to while into undefined behaviourHumbuggery
@PeterMiehle in that case string1[i]<string2[i] would be true and you'll stop returning -1.Euchologion
@Euchologion Good points, I've decided to skip that solution and check in the loop condition instead.Tithable
@Euchologion Not exactly, you know string1[i] was equal to string2[i] the last iteration if there were any. However the second test in the while statement checks if we reached the end of any of the strings and therefore it's needed anyway - the strings "a\0z" and "a\0y" should compare equal.Tithable
@PeterMiehle If you refer to string1 being NULL then it's no problem, strcmp has undefined behavior if any of the parameters are NULL as far as I know (and then an implementation that has undefined behavior then is acceptable).Tithable
T
0

Taken from here.

#include<stdio.h>
#include<string.h>
 
//using arrays , need to move the string using index
int strcmp_arry(char *src1, char *src2)
{
    int i=0;
    while((src1[i]!='\0') || (src2[i]!='\0'))
    {
        if(src1[i] > src2[i])
            return 1;
        if(src1[i] < src2[i])
            return -1;
        i++;
    }
 
    return 0;
}
//using pointers, need to move the position of the pointer
int strcmp_ptr(char *src1, char *src2)
{
    int i=0;
    while((*src1!='\0') || (*src2!='\0'))
    {
        if(*src1 > *src2)
            return 1;
        if(*src1 < *src2)
            return -1;
        src1++;
        src2++;
    }
    return 0;
}
 
int main(void)
{
    char amessage[] = "string";
    char bmessage[] = "string1";
    printf(" value is %d\n",strcmp_arry(amessage,bmessage));
    printf(" value is %d\n",strcmp_ptr(amessage,bmessage));
}

I've made a few changes to make it work like strcmp.

Touching answered 19/1, 2016 at 11:59 Comment(1)
You miss-copied, left out negative signs.Thoroughbred
L
0

My Implementation

int strcmp(const char * s1, const char * s2)
{
    while (*s1 == *s2 && *s1++ | *s2++);
    int i = *s1 - *s2;
    return i < 0 ? -1 : i > 0 ? 1 : 0;
}

return values

-1 // <0
1  // >0
0  // ==0

The last ternary operation is optional

The function would still in the rules of strcmp when you just return *s1 - *s2.

Lac answered 25/4, 2018 at 14:27 Comment(1)
You are comparing elements past the null. See: strcmp("foobar\0b", "foobar\0a") returns 1, should return 0 (as does strcmp from the standard library). Try this: while (*s1 && *s1 == *s2) {s1++; s2++;}Nealah
H
0

Another elegant (but not the most "clean code") implementation, with pointers.

int a_strcmp(char* t, char* s)
{
    for( ; *t == *s ; *s++ , *t++)
        if(*t == '\0')
            return 0;
    return *t - *s;
}

version without using pointers.

int b_strcmp(char t[], char s[])
{
    int i;
    for(i = 0; s[i] == t[i]; ++i)
        if(t[i] == '\0')
            return 0;
    return t[i] - s[i];
}
Housewares answered 27/1, 2021 at 20:46 Comment(5)
You skip the first character of the strings, so strcmp("a", "b") would compare equal.Leverage
Corrected the post, Olaf, good catch, thank you!Housewares
Don't increment inside the while condition. E.g. the very first i++ will return zero/false, resulting in an equal result, even when the strings are different strcmp("abc", "abd"). Same goes for the first implementation. Consider strcmp("\0a", ""), this will break after the first character and compare non-equal, even though it should return zero. Please test both implementations with several inputs and verify that they are working as expected.Leverage
Thanks again, changed implantation using for loop where increment are more straightforward.Housewares
You don't need the asterisks in *s++ , *t++.Heliport

© 2022 - 2024 — McMap. All rights reserved.