Creating my own strcmp () function in C

Asked 9/4, 2014 at 20:52 Answered 2/9, 2019 at 21:23

I was assigned by my teacher to write my own strcmp() function in C. I did create my own version of said function, and I was hoping to get some feedback.

int CompareTwoStrings ( char *StringOne, char *StringTwo ) {
    // Evaluates if both strings have the same length.
    if  ( strlen ( StringOne ) != strlen ( StringTwo ) ) {
        // Given that the strings have an unequal length, it compares between both
        // lengths.
        if  ( strlen ( StringOne ) < strlen ( StringTwo ) ) {
            return ( StringOneIsLesser );
        }
        if  ( strlen ( StringOne ) > strlen ( StringTwo ) ) {
            return ( StringOneIsGreater );
        }
    }
    int i;
    // Since both strings are equal in length...
    for ( i = 0; i < strlen ( StringOne ); i++ ) {
        // It goes comparing letter per letter.
        if  ( StringOne [ i ] != StringTwo [ i ] ) {
            if  ( StringOne [ i ] < StringTwo [ i ] ) {
                return ( StringOneIsLesser );
            }
            if  ( StringOne [ i ] > StringTwo [ i ] ) {
                return ( StringOneIsGreater );
            }
        }
    }
    // If it ever reaches this part, it means they are equal.
    return ( StringsAreEqual );
}

StringOneIsLesser, StringOneIsGreater, StringsAreEqual are defined as const int with the respective values: -1, +1, 0.

Thing is, I'm not exactly sure if, for example, my StringOne has a lesser length than my StringTwo, that automatically means StringTwo is greater, because I don't know how strcmp() is particularly implemented. I need some of your feedback for that.

Squall answered 9/4, 2014 at 20:52 Comment(11)

So what is the question? – Hierocracy 9/4, 2014 at 20:56

To get some feedback from the code I posted. I thought I made that clear. – Squall 9/4, 2014 at 20:59

This is not the best site for general feedback. CodeReview.stackexchange.com is good for that. Given that, the question I answered was "If StringOne has a lesser length than my StringTwo, [does] that automatically mean StringTwo is greater?" – Henotheism 9/4, 2014 at 21:0

Oh, sorry, I didn't know that. Also, sorry for the grammar problems, english is not my native language (I'm from Chile). – Squall 9/4, 2014 at 21:4

Aren't you overcomplicating the problem quite a bit? A standard strcmp implementation body can easily be written in 3 lines... move the two pointers forward while they point to identical characters (being careful to stop at the string terminators), and then just compare the characters you reached, which will be the first differing. – Babbage 9/4, 2014 at 21:9

As I said, i was trying to avoid the use of pointers to compare because I still don't feel confortable using pointers. Perhaps I should have made that clear since the beginning. – Squall 9/4, 2014 at 21:27

This scans both strings to full length, possibly multiple times, whereas an orthodox strcmp() typically only has to look at the first character to find a difference. That's a big performance hit if the strings are many kilobytes in length. – Cube 9/4, 2014 at 21:27

Just a tip: dont use strlen(string1) inside the for loop. It is very bad for the speed. Use an aux variable such as: int size = strlen(string1) and then use that variable on the loop. (for(i=0;i<size);i++) – Mendymene 9/4, 2014 at 22:49

Excuse me if I misunderstand you @jofra, but what you are implying is that if I use 'strlen (string)' inside the for loop, the code MUST calculate a new value for 'strlen (string)' after every increment of 'i', even though it remains the same value? Because if that's the reason I totally get why it slows down the execution of the code. – Squall 10/4, 2014 at 1:33

Yes. It calculates it every loop and so it is quadratic (O(N^2)) instead of being linear O(N). In other words it runs much slower (really allot when n, the number of characters of the string, is big), if you have it calculating strlen(string) every time it loops. – Mendymene 10/4, 2014 at 8:26

abhilekhblogs.blogspot.com/2021/01/… – Penetrate 10/3, 2021 at 17:40

So much for such a simple task. I believe something simple as this would do:

int my_strcmp(const char *a, const char *b)
{
    while (*a && *a == *b) { ++a; ++b; }
    return (int)(unsigned char)(*a) - (int)(unsigned char)(*b);
}

Gram answered 9/4, 2014 at 21:10 Comment(10)

I would cast to int before subtracting to avoid overflows/underflows. Also, the arguments should be const. – Babbage 9/4, 2014 at 21:13

If you allow me to be completely honest, I have no idea what that code does. I was trying to avoid to use pointers because I'm still quite green on the use of pointers. My idea of comparing both strings was comparing them letter by letter. – Squall 9/4, 2014 at 21:23

Note that the 'cast to int' suggestion is particularly crucial if char is an unsigned type. – Cube 9/4, 2014 at 21:29

All primitives are signed unless told otherwise, so I don't think its an issue. In addition, strcmp is supposed to compare readable text and don't need to be binary safe, so unexpected behaviour envolving arithmetics with "negative" symbols is something not to worry with considering you are using strcmp to perform the task it was designed for -- compare and sort text. – Gram 9/4, 2014 at 23:25

while (*a && *a == *b) { ++a; ++b; } would do. – Koziarz 10/4, 2014 at 2:26

@Gram UTF8 text is being more applied these days and used with strcmp(), so char is in the range -128 to -1 (or (128 to 255) in more common than it use to be, – Koziarz 10/4, 2014 at 2:35

The functionality of strcmp() does not depend on the sign-ness of char. strcmp() is specified to perform as if "each character shall be interpreted as if it had the type unsigned char". C11 7.23.1. So unsigned char casts are good. This code compiles with strcmp() as long as 1) int much more range than unsigned char (usually this is true) and 2) char uses 2's complement (usually this is true too). But is does fail to match the const part of the strcmp() signature. – Koziarz 10/1, 2019 at 20:51

@Gram could you please explain why int my_strcmp(char *a, char *b){ return (int)(*a) - (int)(*b);} won't work as as simplified function? It seems to work with all the inputs I have tried... – Forborne 6/5, 2020 at 14:30

@user193203821309 Because you're only testing the first byte in the string. – Gram 6/5, 2020 at 23:39

@Gram , "All primitives are signed unless told otherwise," --> not quite. char range matches signed char or unsigned char. It is implementation defined. This answer has the last 2 of 4 problems. – Koziarz 5/11, 2023 at 16:0

strcmp compares alphabetically: so "aaa" < "b" even though "b" is shorter.

Because of this, you can skip the length check and just do the letter by letter comparison. If you get to a NULL character while both strings are equal so far, then the shorter one is the lesser one.

Also: make StringsAreEqual == 0, not 1 for compatibility with standard sorting functions.

Henotheism answered 9/4, 2014 at 20:56 Comment(0)

strcmp() is fairly easy to code. The usual mis-codings issues include:

Parameter type

strcmp(s1,s2) uses const char * types, not char *. This allows the function to be called with pointers to const data. It conveys to the user the function's non-altering of data. It can help with optimization.

Sign-less compare

All str...() function perform as if char was unsigned char, even if char is signed. This readily affects the result when strings differ and a character outside the range [1...CHAR_MAX] is found.

Range

On select implementations, the range of unsigned char minus unsigned char is outside the int range. Using 2 compares (a>b) - (a-b) avoids any problem rather than a-b;. Further: many compilers recognized that idiom and emit good code.

int my_strcmp(const char *s1, const char *s2) {
  // All compares done as if `char` was `unsigned char`
  const unsigned char *us1 = (const unsigned char *) s1;
  const unsigned char *us2 = (const unsigned char *) s2;

  // As long as the data is the same and '\0' not found, iterate
  while (*us1 == *us2 && *us1 != '\0') {
    us1++;
    us2++;
  }

  // Use compares to avoid any mathematical overflow 
  // (possible when `unsigned char` and `unsigned` have the same range).
  return (*us1 > *us2) - (*us1 < *us2);
}

Dinosaur computers

Machines that use a signed char and non-2's complement, the following can be wrong or a trap with *s1 != '\0'. Such machines could have a negative 0 - which does not indicate the end of a string, yet quits the loop. Using unsigned char * pointers solves that.

int my_strcmp(const char *s1, const char *s2) {
  while (*s1 == *s2 && *s1 != '\0') { // Error!
    s1++;
    s2++;
  }

Koziarz answered 10/1, 2019 at 20:33 Comment(0)

    int mystrncmp(const char * str1, const char * str2, unsigned int n)
     {
      while (*str1 == *str2) {
          if (*str1 == '\0' || *str2 == '\0')
             break;

          str1++;
          str2++;
       }


   if (*str1 == '\0' && *str2 == '\0')
      return 0;
   else
      return -1;
}

Avron answered 28/9, 2016 at 16:1 Comment(3)

While this code snippet may solve the question, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. – Booking 29/9, 2016 at 6:38

What's the purpose of the third function parameter: unsigned int n? – Clue 26/10, 2016 at 17:42

This mystrncmp() never return a positive value , unlike strcmp(), so unexpectedly not useful for sorting. n parameter used - if meant for limiting compare range, should be size_t. – Koziarz 23/4, 2021 at 17:1

Try this also for your better understanding:

#include <stdio.h>
#include <string.h>

int main(void)
{
    char string1[20], string2[20];
    int i=0,len=0, count=0;
    puts("enter the stirng one to compare");
    fgets(string1, sizeof(string1), stdin);
    len = strlen(string1);
    if(string1[len-1]=='\n')
    string1[len-1]='\0';

    puts("enter the stirng two to compare");
    fgets(string2, sizeof(string2), stdin);
    len = strlen(string2);
    if(string2[len-1]=='\n')
    string2[len-1]='\0';
    if(strlen(string1)==strlen(string2))
    {
    for(i=0;string1[i]!='\0', string2[i]!='\0', i<strlen(string1);i++)
    {
        count=string1[i]-string2[i];
        count+=count;
    }
        if(count==0)
            printf("strings are equal");
        else if(count<0)
            printf("string1 is less than string2");
        else if(count>0)
            printf("string2 is less than string1");
    }

    if(strlen(string1)<strlen(string2))
    {
    for(i=0;string1[i]!='\0', i<strlen(string1);i++)
    {
        count=string1[i]-string2[i];
        count+=count;
    }
        if(count==0)
            printf("strings are equal");
        else if(count<0)
            printf("string1 is less than string2");
        else if(count>0)
            printf("string2 is less than string1");
    }

    if(strlen(string1)>strlen(string2))
    {
    for(i=0;string2[i]!='\0', i<strlen(string2);i++)
    {
        count=string1[i]-string2[i];
        count+=count;
    }
        if(count==0)
            printf("strings are equal");
        else if(count<0)
            printf("string1 is less than string2");
        else if(count>0)
            printf("string2 is less than string1");
    }


    return 0;
}

Humane answered 5/9, 2014 at 19:34 Comment(0)

bool str_cmp(char* str1,char* str2)
{
    if (str1 == nullptr || str2 == nullptr)
        return false;


    const int size1 = str_len_v(str1);
    const int size2 = str_len_v(str2);

    if (size1 != size2)
        return false;

    for(int i=0;str1[i] !='\0' && str2[i] !='\0';i++)
    {
        if (str1[i] != str2[i])
            return false;
    }

    return true;
}

Hartzke answered 2/9, 2019 at 21:23 Comment(1)

Would be nice if you explain your solution – Therefor 2/9, 2019 at 21:45

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags