Swap strings without reference counting
Asked Answered
U

1

7

In QuickSort, a lot of time is spent on the swap temp:=var[i]; var[i]:=var[j]; var[j]:=temp. When the vars are integer I time 140 msec for a large random array. When the vars are string, the time is 750 msec. It appears to me that much of the difference is caused by the need to update the reference counts in all three assignments. But is this necessary? After all, the reference counts for var[i] and var[j] will be the same before and after these three assignments. Would the following code corrupt things? (not that it solves a speed problem, but out of interest):

    // P : Pstring;
    move(values[i],P,sizeOf(Pstring));
    move(values[j],values[i],sizeOf(Pstring));
    move(P,values[i],sizeOf(Pstring));

There is no temp variable. Only the two pointers to the strings are interchanged. And if this is oké, is there a Delphi function to swap 2 pointers?

Ulcerate answered 9/1, 2021 at 22:8 Comment(9)
Faster to do it with a Pointer(...) cast rather than calling Move. That's the idiomatic way to evade ref counting.Unable
@David - The length and reference count are stored at a negative offset from that pointer, unless things have changed in recent versions of the compiler that I missed. Swapping the pointers won't move that data, unless I'm missing something here. (If things have changed and I missed it, can you point me to the docs so I can update my knowledge?)Putrefy
In a first pass, create an array of pointers to records in an array of records each containing a pointer to string data, string index. Then sort that array of pointer moving pointers around. When sort is done, walk the array of records to reorder the original array on strings and delete the array of records.Bernal
@Ken Swapping the pointers doesn't move the metadata. But that's fine because these are reference types. Swapping the pointers is sufficient. This is a well known optimization used by libraries such as Spring4d.Unable
@Bernal I expect that will be slower than the inplace sort with ref counting.Unable
@KenWhite: A variable of type string is simply a pointer to a string heap object. This object contains the actual string plus a terminating null at non-negative offsets and metadata (codepage, reference count, and length) at negative offset.Panamerican
@David: Got it. That makes sense, I think.Putrefy
@DavidHeffernan Depends on the number of strings. The OP talk about "large" array.Bernal
@Bernal I know. I still expect your idea to be slower.Unable
U
7

What you are proposing is a well known and valid optimisation. Rather than calling the Move function it is better to perform direct assignments using casts to avoid reference counting code being generated.

var
  temp: Pointer;
.... 
temp := Pointer(var[i]);
Pointer(var[i]) := Pointer(var[j]);
Pointer(var[j]) := temp;

In order for this to work you need to be confident that no exceptions will be raised during the swap process. Simple assignments of valid memory will not lead to exceptions, so this concern can be readily dismissed.

Unable answered 10/1, 2021 at 9:18 Comment(2)
This is exactly how we swap strings in mORMot. ;)Casserole
Thank you all. I followed @DavidHeffernan 's suggestion. I test an array of 10M sorted strings, of which 100k have been randomly modified in place. Sorting the array again takes 540 msec, compared to 5600 msec for Tarray.sort. A QuickSort call will only be made when there is no structure to the data. I time then 5650 msec with the optimised code vs 12000 msec for Tarray.sort.Ulcerate

© 2022 - 2024 — McMap. All rights reserved.