String sorting issue in C#
Asked Answered
B

3

16

I have List like this

    List<string> items = new List<string>();
    items.Add("-");
    items.Add(".");
    items.Add("a-");
    items.Add("a.");
    items.Add("a-a");
    items.Add("a.a");

    items.Sort();

    string output = string.Empty;
    foreach (string s in items)
    {
        output += s + Environment.NewLine;
    }

MessageBox.Show(output);

The output is coming back as

-
.
a-
a.
a.a
a-a

where as I am expecting the results as

-
.
a-
a.
a-a
a.a

Any idea why "a-a" is not coming before "a.a" where as "a-" comes before "a."

Birdwatcher answered 20/2, 2012 at 0:58 Comment(0)
M
6

If you want your string sort to be based on the actual byte value as opposed to the rules defined by the current culture you can sort by Ordinal:

items.Sort(StringComparer.Ordinal);

This will make the results consistent across all cultures (but it will produce unintuitive sortings of "14" coming before "9" which may or may not be what you're looking for).

Melanosis answered 20/2, 2012 at 1:21 Comment(1)
Thanks Jared, Could you tell me how I can sort if the data is in a column of DataTable DataTable dataTable = new DataTable(); dataTable.Columns.Add("Item", typeof (string)); dataRow = dataTable.NewRow(); dataRow["Item"] = "a-a"; dataTable.Rows.Add(dataRow); dataRow = dataTable.NewRow(); dataRow["Item"] = "a.a"; dataTable.Rows.Add(dataRow); DataRow[] rows = dataTable.Select("", "Item ASC");Birdwatcher
N
18

I suspect that in the last case "-" is treated in a different way due to culture-specific settings (perhaps as a "dash" as opposed to "minus" in the first strings). MSDN warns about this:

The comparison uses the current culture to obtain culture-specific information such as casing rules and the alphabetic order of individual characters. For example, a culture could specify that certain combinations of characters be treated as a single character, or uppercase and lowercase characters be compared in a particular way, or that the sorting order of a character depends on the characters that precede or follow it.

Also see in this MSDN page:

The .NET Framework uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them; for example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases; therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string.

So, hyphen gets a special treatment in the default sort mode in order to make the word sort more "natural".

You can get "normal" ordinal sort if you specifically turn it on:

     Console.WriteLine(string.Compare("a.", "a-"));                  //1
     Console.WriteLine(string.Compare("a.a", "a-a"));                //-1

     Console.WriteLine(string.Compare("a.", "a-", StringComparison.Ordinal));    //1
     Console.WriteLine(string.Compare("a.a", "a-a", StringComparison.Ordinal));  //1

To sort the original collection using ordinal comparison use:

     items.Sort(StringComparer.Ordinal);
Nedi answered 20/2, 2012 at 1:20 Comment(4)
I think you cracked it, the word sort seems to be the issue here.Prentiss
@ntziolis: Looks like this is the case indeed.Nedi
how to specify this Ordinal comparer if the data is in DataColumn of a DataTableBirdwatcher
@Satya: can you extract your data from the DataColumn into a list and then sort it? Or maybe you can start another question with your code sample?Nedi
M
6

If you want your string sort to be based on the actual byte value as opposed to the rules defined by the current culture you can sort by Ordinal:

items.Sort(StringComparer.Ordinal);

This will make the results consistent across all cultures (but it will produce unintuitive sortings of "14" coming before "9" which may or may not be what you're looking for).

Melanosis answered 20/2, 2012 at 1:21 Comment(1)
Thanks Jared, Could you tell me how I can sort if the data is in a column of DataTable DataTable dataTable = new DataTable(); dataTable.Columns.Add("Item", typeof (string)); dataRow = dataTable.NewRow(); dataRow["Item"] = "a-a"; dataTable.Rows.Add(dataRow); dataRow = dataTable.NewRow(); dataRow["Item"] = "a.a"; dataTable.Rows.Add(dataRow); DataRow[] rows = dataTable.Select("", "Item ASC");Birdwatcher
P
4

The Sort method of the List<> class relies on the default string comparer of the .NET Framework, which is actually an instance of the current CultureInfo of the Thread.

The CultureInfo specifies the alphabetical order of characters and it seems that the default one is using an order different order to what you would expect.

When sorting you can specify a specific CultureInfo, one that you know will match your sorting requirements, sample (german culture):

var sortCulture = new CultureInfo("de-DE");
items.Sort(sortCulture);

More info can be found here:
http://msdn.microsoft.com/en-us/library/b0zbh7b6.aspx
http://msdn.microsoft.com/de-de/library/system.stringcomparer.aspx

Prentiss answered 20/2, 2012 at 1:12 Comment(5)
what not clear is "-" (hyphen) is coming before "."(dot) and "a-" before "a."; why not 'a-a" before "a.a"?Birdwatcher
Theoretically, the current culture might consider . and - to be the same order. The .Sort method is "unstable", which means that the order of equal items is not guaranteed.Luculent
I tested on US English and got the same results as the OP. Even when testing using String.Compare, I never got 0 (equal). I either got -1 or 1, depending on which was first. So it is probably not a problem with the .Sort method.Ize
I trie System.Threading.Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US"); items.Sort(); but the results haven't changedBirdwatcher
I think Yacoder has cracked the case in his answer, it's the word sort thing that introduces this special handlingPrentiss

© 2022 - 2024 — McMap. All rights reserved.