I am trying to reimplement the strcasecmp
function in C and I noticed what appears to be an inconsistency in the comparison process.
From man strcmp
The strcmp() function compares the two strings s1 and s2. The locale is not taken into account (for a locale-aware comparison, see strcoll(3)). It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2.
From man strcasecmp
The strcasecmp() function performs a byte-by-byte comparison of the strings s1 and s2, ignoring the case of the characters. It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2.
int strcmp(const char *s1, const char *s2);
int strcasecmp(const char *s1, const char *s2);
Given, this information, I don't understand the result of the following code:
#include <stdio.h>
#include <string.h>
int main()
{
// ASCII values
// 'A' = 65
// '_' = 95
// 'a' = 97
printf("%i\n", strcmp("A", "_"));
printf("%i\n", strcmp("a", "_"));
printf("%i\n", strcasecmp("A", "_"));
printf("%i\n", strcasecmp("a", "_"));
return 0;
}
Ouput:
-1 # "A" is less than "_"
1 # "a" is more than "_"
2 # "A" is more than "_" with strcasecmp ???
2 # "a" is more than "_" with strcasecmp
It appears that, if the current character in s1
is a letter, it is always converted to lowercase, regardless of whether the current character in s2
is a letter or not.
Can someone explain this behaviour? Shouldn't the first and third lines be identical?
Thank you in advance!
PS:
I am using gcc 9.2.0
on Manjaro.
Also, when I compile with the -fno-builtin
flag I get instead:
-30
2
2
2
I guess it's because the program does not use gcc's optimised functions, but the question remains.
printf("%i\n", strcasecmp("a", "_"));
This should presumably have the same result asprintf("%i\n", strcasecmp("A", "_"));
But that means that one of these two case-insensitive calls is going to disagree with its case-sensitive counterpart. – Intervalestrcasecmp
you're refering to is not accurate. More details in the upvoted answers. – CostermansvilleA < _ && a > _ && A == a
would cause so many problems. – Autocratunsigned char
. C17/18 "String handling <string.h>" --> "For all functions in this subclause, each character shall be interpreted as if it had the typeunsigned char
". This makes a difference oncechar
values are outside ASCII range 0-127. – Mcafee