How does testing if a string is 'greater' than another work in Bash?
Asked Answered
K

3

30

In Bash I can write the following test

[[ "f" > "a" ]]

which results in returning 0, i.e. true. How does bash actually perform this string comparison? From my understanding > does an integer comparison. Does it try to compare the ASCII value of the operands?

Keniakenilworth answered 16/8, 2012 at 14:28 Comment(1)
Yeah I suspect it has a polymorphic deal going on where > means one thing with two strings, and another with two numbers. However I'm not an experienced bash coder.Manila
J
22

From help test:

  STRING1 > STRING2
                 True if STRING1 sorts after STRING2 lexicographically.

Internally, bash either uses strcoll() or strcmp() for that:

else if ((op[0] == '>' || op[0] == '<') && op[1] == '\0')
  {
    if (shell_compatibility_level > 40 && flags & TEST_LOCALE)
      return ((op[0] == '>') ? (strcoll (arg1, arg2) > 0) : (strcoll (arg1, arg2) < 0));
    else
      return ((op[0] == '>') ? (strcmp (arg1, arg2) > 0) : (strcmp (arg1, arg2) < 0));
  }

The latter actually compares ASCII codes, the former (used when locale is enabled) performs a more specific comparison which is suitable for sorting in given locale.

Janeth answered 16/8, 2012 at 14:38 Comment(0)
D
9

It's an alphabetical comparison (AIUI the sort order may be influenced by the current locale). It compares the first character of each string, and if the one on the left has a higher value it's true, if lower it's false; if they're the same, then it compares the second character, etc.

This is not the same as integer comparison, for that you use [[ 2 -gt 1 ]] or (( 2 > 1 )). To illustrate the difference between string and integer comparison, consider that all of the following are "true":

[[ 2 > 10 ]]     # because "2" comes after "1" in ASCII sort order
[[ 10 -gt 2 ]]   # because 10 is a larger number than 2
(( 10 > 2 ))     # ditto

Here are some more test that're true as string comparisons, but would be false with integer comparison:

[[ 05 < 5 ]]    # Because "0" comes before "5"
[[ +5 < 0 ]]    # Because "+" comes before the digits
[[ -0 < 0 ]]    # Because "-" comes before the digits
[[ -1 < -2 ]]   # Because "-" doesn't change how the second character is compared
Declinate answered 16/8, 2012 at 14:38 Comment(0)
K
1

Yes, it compares the ascii value and if equal then repeat the comparison in the next character.

/* Copyright (C) 1991, 1996, 1997, 2003 Free Software Foundation, Inc. 
   This file is part of the GNU C Library. 

   The GNU C Library is free software; you can redistribute it and/or 
   modify it under the terms of the GNU Lesser General Public 
   License as published by the Free Software Foundation; either 
   version 2.1 of the License, or (at your option) any later version. 

   The GNU C Library is distributed in the hope that it will be useful, 
   but WITHOUT ANY WARRANTY; without even the implied warranty of 
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU 
   Lesser General Public License for more details. 

   You should have received a copy of the GNU Lesser General Public 
   License along with the GNU C Library; if not, write to the Free 
   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
   02111-1307 USA.  */ 

#include <string.h> 
#include <memcopy.h> 

#undef strcmp 

/* Compare S1 and S2, returning less than, equal to or 
   greater than zero if S1 is lexicographically less than, 
   equal to or greater than S2.  */ 
int 
strcmp (p1, p2) 
     const char *p1; 
     const char *p2; 
{ 
  register const unsigned char *s1 = (const unsigned char *) p1; 
  register const unsigned char *s2 = (const unsigned char *) p2; 
  unsigned reg_char c1, c2; 

  do 
    { 
      c1 = (unsigned char) *s1++; 
      c2 = (unsigned char) *s2++; 
      if (c1 == '\0') 
        return c1 - c2; 
    } 
  while (c1 == c2); 

  return c1 - c2; 
} 
Karikaria answered 16/8, 2012 at 14:41 Comment(3)
I don't see why you're pasting some mostly-irrelevant strcmp() function. bash supports locales, and in this case it uses strcoll() instead to perform comparison suitable for a particular charset.Internalcombustion
You right but strcmp illustrate the string comparison in a simpler form. The purpose is show how to compare a string in general and not the specific bash implementation. The method is the same in bash, python, perl, PHP, c, Java...Karikaria
More... the question is not about the name of function but about method.Karikaria

© 2022 - 2024 — McMap. All rights reserved.