Fast way to split a string into fixed-length parts in Delphi
Asked Answered
N

3

6

I need to split a string to a TStringList with fixed-length sub-strings.

Currently I use:

procedure StrToStringList(ASource: string; AList: TStrings; AFixedLen: Integer);
begin
    Assert(Assigned(AList));
    while Length(ASource) > AFixedLen do
    begin
        AList.Add(LeftStr(ASource, AFixedLen));
        Delete(ASource, 1, AFixedLen);
    end;
    AList.Add(ASource);
end;

This works, but seems to be slow. Any better / faster idea?

Edited: Profiling of the results:

The speed gain is quite impressive. Here are the results of my (subjective) profiling.

Data size: 290KB, FixedLen: 100:

  • Original code: 58 ms
  • Heffernan: 1 ms
  • Deltics: 1 ms

Data size: 2805KB, FixedLen: 100:

  • Original code: 5803 ms
  • Heffernan: 5 ms
  • Deltics: 4 ms
Niemi answered 11/8, 2015 at 13:26 Comment(2)
Obviously, Delete call is unnecessary.Validate
It's Rud-y, not Rud-i. I bet you could see that difference?Recommendation
T
6

There are some immediately obvious optimisations possible in your code:

  1. Do not modify the source string, simply extract the required substrings. You can then make the input string parameter const, allowing the compiler to further optimise calls made to the procedure

  2. Since you are dealing with fixed lengths and an input string of known length, then you can pre-calculate the required capacity of the string list and avoid memory re-allocations as the list is added to.

Here is how I would go about this:

procedure StrToStringList(const aSource: String;
                          const aList: TStrings;
                          const aFixedLen: Integer);
var
  idx: Integer;
  srcLen: Integer;
begin
  aList.Capacity := (Length(aSource) div aFixedLen) + 1;

  idx    := 1;
  srcLen := Length(aSource);

  while idx <= srcLen do
  begin
    aList.Add(Copy(aSource, idx, aFixedLen));
    Inc(idx, aFixedLen);
  end;
end;

The capacity calculation here may result in an excess capacity of 1 where the input string divides exactly by the fixed length, but this is a negligible overhead imho and acceptable where the goal is optimal performance (the alternative is a conditional branch to modify or calculate the capacity differently to cater for what is likely to be the minority of cases).

Trumpetweed answered 11/8, 2015 at 20:47 Comment(1)
Accepted, this code seems to be slightly faster than the other one.Niemi
O
7

I think it is wasteful to modify the input string. Avoid this like this:

var
  Remaining: Integer;
  StartIndex: Integer;
begin
  Remaining := Length(ASource);
  StartIndex := 1;
  while Remaining > 0 do
  begin
    AList.Add(Copy(ASource, StartIndex, AFixedLen));
    inc(StartIndex, AFixedLen);
    dec(Remaining, AFixedLen);
  end;
end;

This will reduce the amount of heap allocation, and the copying.

However, I would not be surprised if you observed little gain in performance. In order to perform any serious optimisation we'd likely need to see some example input data.

Ora answered 11/8, 2015 at 13:34 Comment(3)
Yes, your solution is faster. Thanks. This is about preprocessing different chunks of COBOL data that must be read in. Serious optimization would certainly avoid splitting the data into StringLists but at the moment I am looking for ways to optimize without rewriting the underlying structure.Niemi
If the items are fixed length, then string lists may not be the best approach. Character arrays would likely be faster. I'm sure you can do a lot better than the code here, but I don't think we've got enough detail to be more specific.Ora
After some profiling I am impressed how fast that TStringList-based approach actually is. The COBOL data is splitted into datasets by the function here. Afterwards dataset by dataset is parsed from the stringlist. No need for further optimization on this front at the moment.Niemi
T
6

There are some immediately obvious optimisations possible in your code:

  1. Do not modify the source string, simply extract the required substrings. You can then make the input string parameter const, allowing the compiler to further optimise calls made to the procedure

  2. Since you are dealing with fixed lengths and an input string of known length, then you can pre-calculate the required capacity of the string list and avoid memory re-allocations as the list is added to.

Here is how I would go about this:

procedure StrToStringList(const aSource: String;
                          const aList: TStrings;
                          const aFixedLen: Integer);
var
  idx: Integer;
  srcLen: Integer;
begin
  aList.Capacity := (Length(aSource) div aFixedLen) + 1;

  idx    := 1;
  srcLen := Length(aSource);

  while idx <= srcLen do
  begin
    aList.Add(Copy(aSource, idx, aFixedLen));
    Inc(idx, aFixedLen);
  end;
end;

The capacity calculation here may result in an excess capacity of 1 where the input string divides exactly by the fixed length, but this is a negligible overhead imho and acceptable where the goal is optimal performance (the alternative is a conditional branch to modify or calculate the capacity differently to cater for what is likely to be the minority of cases).

Trumpetweed answered 11/8, 2015 at 20:47 Comment(1)
Accepted, this code seems to be slightly faster than the other one.Niemi
S
2

Don't use delete inside the loop. It causes the whole string to move. Use an integer based index variable starting at 1 and increase it each time by AFixedLen after you have used copy to extract the substring until you reached the end.

Stagecraft answered 11/8, 2015 at 13:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.