How can I get TStringList to sort differently in Delphi
Asked Answered
D

3

15

I have a simple TStringList. I do a TStringList.Sort on it.

Then I notice that the underscore "_" sorts before the capital letter "A". This was in contrast to a third party package that was sorting the same text and sorted _ after A.

According to the ANSI character set, A-Z are characters 65 - 90 and _ is 95. So it looks like the 3rd party package is using that order and TStringList.Sort isn't.

I drilled down into guts of TStringList.Sort and it is sorting using AnsiCompareStr (Case Sensitive) or AnsiCompareText (Case Insensitive). I tried it both ways, setting my StringList's CaseSensitive value to true and then false. But in both cases, the "_" sorts first.

I just can't imagine that this is a bug in TStringList. So there must be something else here that I am not seeing. What might that be?

What I really need to know is how can I get my TStringList to sort so that it is in the same order as the other package.

For reference, I am using Delphi 2009 and I'm using Unicode strings in my program.


So the final answer here is to override the Ansi compares with whatever you want (e.g. non-ansi compares) as follows:

type
  TMyStringList = class(TStringList)
  protected
    function CompareStrings(const S1, S2: string): Integer; override;
  end;

function TMyStringList.CompareStrings(const S1, S2: string): Integer;
begin
  if CaseSensitive then
    Result := CompareStr(S1, S2)
  else
    Result := CompareText(S1, S2);
end;
Didymous answered 1/2, 2010 at 6:34 Comment(5)
Windows also sorts the _ before the A so TStringlist is at least consisten with the OS.Nappie
getting results you don't expect, doesn't mean it's a bug. It is not a bug, it is designed this way for correctly supporting the user's (or OS's in user's behalf) choice of sorting order.Magnien
You're writing this question with the presumption that there is a correct way to sort nonalphabetic characters. Where do underscore words appear in your dictionary?Hydrophilous
@Rob: I don't need the correct way. I really only need a consistent way - so that my program will use the same order for both TStringList and my 3rd party package.Didymous
You mentioned in several places in your question that you want the correct or proper sort order. I'll edit your question to change that since it's not really what you need.Hydrophilous
L
38

Define "correctly".
i18n sorting totally depends on your locale.
So I totally agree with PA that this is not a bug: the default Sort behaviour works as designed to allow i18n to work properly.

Like Gerry mentions, TStringList.Sort uses AnsiCompareStr and AnsiCompareText (I'll explain in a few lines how it does that).

But: TStringList is flexible, it contains Sort, CustomSort and CompareStrings, which all are virtual (so you can override them in a descendant class)
Furthermore, when you call CustomSort, you can plug in your own Compare function.

At the of this answer is a Compare function that does what you want:

  • Case Sensitive
  • Not using any locale
  • Just compare the ordinal value of the characters of the strings

CustomSort is defined as this:

procedure TStringList.CustomSort(Compare: TStringListSortCompare);
begin
  if not Sorted and (FCount > 1) then
  begin
    Changing;
    QuickSort(0, FCount - 1, Compare);
    Changed;
  end;
end;

By default, the Sort method has a very simple implementation, passing a default Compare function called StringListCompareStrings:

procedure TStringList.Sort;
begin
  CustomSort(StringListCompareStrings);
end;

So, if you define your own TStringListSortCompare compatible Compare method, then you can define your own sorting.
TStringListSortCompare is defined as a global function taking the TStringList and two indexes referring the items you want to compare:

type
  TStringListSortCompare = function(List: TStringList; Index1, Index2: Integer): Integer;

You can use the StringListCompareStrings as a guideline for implementing your own:

function StringListCompareStrings(List: TStringList; Index1, Index2: Integer): Integer;
begin
  Result := List.CompareStrings(List.FList^[Index1].FString,
                                List.FList^[Index2].FString);
end;

So, by default TStringList.Sort defers to TList.CompareStrings:

function TStringList.CompareStrings(const S1, S2: string): Integer;
begin
  if CaseSensitive then
    Result := AnsiCompareStr(S1, S2)
  else
    Result := AnsiCompareText(S1, S2);
end;

Which then use the under lying Windows API function CompareString with the default user locale LOCALE_USER_DEFAULT:

function AnsiCompareStr(const S1, S2: string): Integer;
begin
  Result := CompareString(LOCALE_USER_DEFAULT, 0, PChar(S1), Length(S1),
    PChar(S2), Length(S2)) - 2;
end;

function AnsiCompareText(const S1, S2: string): Integer;
begin
  Result := CompareString(LOCALE_USER_DEFAULT, NORM_IGNORECASE, PChar(S1),
    Length(S1), PChar(S2), Length(S2)) - 2;
end;

Finally the Compare function you need. Again the limitations:

  • Case Sensitive
  • Not using any locale
  • Just compare the ordinal value of the characters of the strings

This is the code:

function StringListCompareStringsByOrdinalCharacterValue(List: TStringList; Index1, Index2: Integer): Integer;
var
  First: string;
  Second: string;
begin
  First := List[Index1];
  Second := List[Index2];
  if List.CaseSensitive then
    Result := CompareStr(First, Second)
  else
    Result := CompareText(First, Second);
end;

Delphi ain't closed, quite the opposite: often it is a really flexible architecture.
It is often just a bit of digging to see where you can hook into the that flexibility.

--jeroen

Lout answered 1/2, 2010 at 6:34 Comment(3)
Very nice! I knew about this, but had not seen all this described in one place before.Lilialiliaceous
TStringList sorting is strictly wrong because it does not conform to the ordering induced by the builtin operators ('<' and so on), and this diversion from the default ordering isn't even documented. The fact that this was probably intended as a cheap substitute for i18n doesn't make it any better. Upvote for providing the workaround that restores correct sorting...Handal
Flexible also ascending or descending order: function CompareSalary3(List: TStringList; Index1, Index2: Integer): integer; begin //ascending: result:= CompareString(list[index1], list[index2]); //descending just negate result result:= - CompareString(list[index1], list[index2]); end;Kippy
H
5

AnsiCompareStr / AnsiCompareText take more than character number into account. They take the users locale into account, so "e" will sort along with "é", "ê" etc.

To make it sort it in Ascii order, use a custom compare function as described here

Hooper answered 1/2, 2010 at 7:18 Comment(0)
H
0

AnsiCompareStr (CompareString with LOCALE_USER_DEFAULT) has fault, because it gets characters with punctation as equal:

e1 é1 e2 é2

Correct order is (for example for Czech):

e1 e2 é1 é2

Does anybody know how to avoid this error in ordering?


11.2.2010: I must apologize described behavior is fully according linguistic rules. Although I think it is silly and "bad" it is not error in API function.

Explorer in Windows XP uses so called intuitive filname ordering which gives better results but it can't be used programatically.

Hortatory answered 10/2, 2010 at 14:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.