String Comparison in Java
Asked Answered
C

8

86

What does "compare two strings lexicographically" mean?

Crore answered 31/10, 2010 at 19:7 Comment(0)
C
172

Leading from answers from @Bozho and @aioobe, lexicographic comparisons are similar to the ordering that one might find in a dictionary.

The Java String class provides the .compareTo () method in order to lexicographically compare Strings. It is used like this "apple".compareTo ("banana").

The return of this method is an int which can be interpreted as follows:

  • returns < 0 then the String calling the method is lexicographically first (comes first in a dictionary)
  • returns == 0 then the two strings are lexicographically equivalent
  • returns > 0 then the parameter passed to the compareTo method is lexicographically first.

More specifically, the method provides the first non-zero difference in ASCII values.

Thus "computer".compareTo ("comparison") will return a value of (int) 'u' - (int) 'a' (20). Since this is a positive result, the parameter ("comparison") is lexicographically first.

There is also a variant .compareToIgnoreCase () which will return 0 for "a".compareToIgnoreCase ("A"); for example.

Concubine answered 31/10, 2010 at 19:40 Comment(3)
For collation comparisons (i.e. is 'é' equivalent to 'e') have a look at download.oracle.com/javase/1.5.0/docs/api/java/text/…Concubine
Just a minor thing. "computer".compareTo ("comparison") will return a value of (int) 'u' - (int) 'a' 20. Not (21).Felicle
The language the dictionary is in also matters. This is what Locale is for.Staal
T
12

The wording "comparison" is mildly misleading. You are not comparing for strict equality but for which string comes first in the dictionary (lexicon).

This is the feature that allows collections of strings to be sortable.

Note that this is very dependent on the active locale. For instance, here in Denmark we have a character "å" which used to be spelled as "aa" and is very distinct from two single a's (EDIT: If pronounced as "å"!). Hence Danish sorting rules treat two consequtive a's identically to an "å", which means that it goes after z. This also means that Danish dictionaries are sorted differently than English or Swedish ones.

Trichocyst answered 31/10, 2010 at 19:29 Comment(2)
Interesting! Does javas compareTo take this into account?Whallon
@aioobe, this is explained better than I can in the Java Tutorial: download.oracle.com/javase/tutorial/i18n/text/…Staal
S
10

The String.compareTo(..) method performs lexicographical comparison. Lexicographically == alphebetically.

Swellhead answered 31/10, 2010 at 19:14 Comment(0)
B
8

Comparing sequencially the letters that have the same position against each other.. more like how you order words in a dictionary

Brachycephalic answered 31/10, 2010 at 19:14 Comment(0)
W
6

If you check which string would come first in a lexicon, you've done a lexicographical comparison of the strings!

Some links:

Stolen from the latter link:

A string s precedes a string t in lexicographic order if

  • s is a prefix of t, or
  • if c and d are respectively the first character of s and t in which s and t differ, then c precedes d in character order.

Note: For the characters that are alphabetical letters, the character order coincides with the alphabetical order. Digits precede letters, and uppercase letters precede lowercase ones.

Example:

  • house precedes household
  • Household precedes house
  • composer precedes computer
  • H2O precedes HOTEL
Whallon answered 31/10, 2010 at 19:10 Comment(0)
M
5

Java lexicographically order:

  1. Numbers -before-
  2. Uppercase -before-
  3. Lowercase

Odd as this seems, it is true...
I have had to write comparator chains to be able to change the default behavior.
Play around with the following snippet with better examples of input strings to verify the order (you will need JSE 8):

import java.util.ArrayList;

public class HelloLambda {

public static void main(String[] args) {
    ArrayList<String> names = new ArrayList<>();
    names.add("Kambiz");
    names.add("kambiz");
    names.add("k1ambiz");
    names.add("1Bmbiza");
    names.add("Samantha");
    names.add("Jakey");
    names.add("Lesley");
    names.add("Hayley");
    names.add("Benjamin");
    names.add("Anthony");

    names.stream().
        filter(e -> e.contains("a")).
        sorted().
        forEach(System.out::println);
}
}

Result

1Bmbiza
Benjamin
Hayley
Jakey
Kambiz
Samantha
k1ambiz
kambiz

Please note this is answer is Locale specific.
Please note that I am filtering for a name containing the lowercase letter a.

Motorboat answered 26/11, 2016 at 13:49 Comment(0)
A
0

Below Algo "compare two strings lexicographically"

  1. Input two strings string 1 and string 2.

  2. for (int i = 0; i < str1.length() && i < str2.length(); i ++)

    (Loop through each character of both strings comparing them until one of the string terminates):

    a. If unicode value of both the characters is same then continue;

    b. If unicode value of character of string 1 and unicode value of string 2 is different then return (str1[i]-str2[i])

  3. if length of string 1 is less than string2

    return str2[str1.length()]

    else

    return str1[str2.length()]

    // This method compares two strings lexicographically

    public static int compareCustom(String s1, String s2) {
        for (int i = 0; i < s1.length() && i< s2.length(); i++) {
            if(s1.charAt(i) == s2.charAt(i)){
                //System.out.println("Equal");
                continue;
            }
            else{
                return s1.charAt(i) - s2.charAt(i);
            }   
        }
        if(s1.length()<s2.length()){
            return s2.length() - s1.length();
        }
        else if(s1.length()>s2.length()){
            return s1.length()-s2.length();
        }
        else{
            return 0;
        }
    }
    

if two String are equal it will return 0 otherwise return Negative or positive value

Source : - Source

Angle answered 20/8, 2019 at 8:2 Comment(0)
H
0

You might also come across a task, where you have to implement the lexicographical comparison "manually", not using the default compareTo() method.

The below simple algorithm is based on comparing the Unicode value of chars at subsequent positions.

@Override
public int compareTo(Person otherPerson) {
        
// Getters, constructor, variables ... 

        int result = 0;

            for (int i = 0; i < getName().length() && i < otherPerson.getName().length(); i++) {
                if (getName().charAt(i) > otherPerson.getName().charAt(i)) {
                    result = 1;
                    break;
                } else if (getName().charAt(i) < otherPerson.getName().charAt(i)) {
                    result = -1;
                    break;
                }
            }
        }
        return result;
    }
}
Hoodlum answered 23/9, 2021 at 9:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.