StringReplace alternatives to improve performance
Asked Answered
S

8

12

I am using StringReplace to replace &gt and &lt by the char itself in a generated XML like this:

StringReplace(xml.Text,'>','>',[rfReplaceAll]) ;
StringReplace(xml.Text,'&lt;','<',[rfReplaceAll]) ;

The thing is it takes way tooo long to replace every occurence of &gt.

Do you purpose any better idea to make it faster?

Schock answered 26/9, 2008 at 14:22 Comment(2)
Can you get as any feedback on original code spped vs. Jorge's code vs. FastStrings?Sahara
The problem with StringReplace is when you have many occurrences that need to be replaced. For that case you should write your own version similar to what Gabr posted. Actually the problem is not calling StringReplace 2 times but having it to handle dozens of replacements.Otolaryngology
C
7

Try FastStrings.pas from Peter Morris.

Camaraderie answered 26/9, 2008 at 15:11 Comment(3)
FastStrings are great but the unit doesn't work anymore in Delphi 2009, which could be a problem (if not now then when (if) doing an upgrade).Sahara
Faststrings is leaps and bounds faster than the normal Delphi StringReplace function. I really hope Peter releases a new version for Delphi 2009.Cynic
the project was discontinuedBathometer
R
9

If you're using Delphi 2009, this operation is about 3 times faster with TStringBuilder than with ReplaceString. It's Unicode safe, too.

I used the text from http://www.CodeGear.com with all occurrences of "<" and ">" changed to "&lt;" and "&gt;" as my starting point.

Including string assignments and creating/freeing objects, these took about 25ms and 75ms respectively on my system:

function TForm1.TestStringBuilder(const aString: string): string;
var
  sb: TStringBuilder;
begin
  StartTimer;
  sb := TStringBuilder.Create;
  sb.Append(aString);
  sb.Replace('&gt;', '>');
  sb.Replace('&lt;', '<');
  Result := sb.ToString();
  FreeAndNil(sb);
  StopTimer;
end;

function TForm1.TestStringReplace(const aString: string): string;
begin
  StartTimer;
  Result := StringReplace(aString,'&gt;','>',[rfReplaceAll]) ;
  Result := StringReplace(Result,'&lt;','<',[rfReplaceAll]) ;
  StopTimer;
end;
Rickettsia answered 26/9, 2008 at 23:44 Comment(0)
C
7

Try FastStrings.pas from Peter Morris.

Camaraderie answered 26/9, 2008 at 15:11 Comment(3)
FastStrings are great but the unit doesn't work anymore in Delphi 2009, which could be a problem (if not now then when (if) doing an upgrade).Sahara
Faststrings is leaps and bounds faster than the normal Delphi StringReplace function. I really hope Peter releases a new version for Delphi 2009.Cynic
the project was discontinuedBathometer
P
6

You should definitely look at the Fastcode project pages: http://fastcode.sourceforge.net/

They ran a challenge for a faster StringReplace (Ansi StringReplace challenge), and the 'winner' was 14 times faster than the Delphi RTL.

Several of the fastcode functions have been included within Delphi itself in recent versions (D2007 on, I think), so the performance improvement may vary dramatically depending on which Delphi version you are using.

As mentioned before, you should really be looking at a Unicode-based solution if you're serious about processing XML.

Panga answered 28/9, 2008 at 10:1 Comment(0)
R
3

The problem is that you are iterating the entire string size twice (one for replacing &gt; by > and another one to replace &lt; by <).

You should iterate with a for and simply check ahead whenever you find a & for a gt; or lt; and do the immediate replace and then skipping 3 characters ((g|l)t;). This way it can do that in proportional time to the size of the string xml.Text.


A simple C# example as I do not know Delphi but should do for you to get the general idea.

String s = "&lt;xml&gt;test&lt;/xml&gt;";
char[] input = s.ToCharArray();
char[] res = new char[s.Length];
int j = 0;
for (int i = 0, count = input.Length; i < count; ++i)
{
    if (input[i] == '&')
    {
        if (i < count - 3)
        {
            if (input[i + 1] == 'l' || input[i + 1] == 'g')
            {
                if (input[i + 2] == 't' && input[i + 3] == ';')
                {
                    res[j++] = input[i + 1] == 'l' ? '<' : '>';
                    i += 3;
                    continue;
                }
            }
        }
    }

    res[j++] = input[i];
}
Console.WriteLine(new string(res, 0, j));

This outputs:

<xml>test</xml>
Revis answered 26/9, 2008 at 14:28 Comment(1)
The problem is not iterating the string twice but having it to handle many replacements. There is no problem calling it twice or more for different strings to replace if you implement it different from how it's done in the RTL.Otolaryngology
R
3

When you are dealing with a multiline text files, you can get some performance by processing it line by line. This approach reduced time in about 90% to replaces on >1MB xml file.

procedure ReplaceMultilineString(xml: TStrings);
var
  i: Integer;
  line: String;
begin
  for i:=0 to xml.Count-1 do
  begin
    line := xml[i];
    line := StringReplace(line, '&gt;', '>', [rfReplaceAll]);
    line := StringReplace(line, '&lt;', '<', [rfReplaceAll]);
    xml[i] := line;
  end;
end;
Ru answered 1/2, 2017 at 13:37 Comment(0)
S
2

Untested conversion of the C# code written by Jorge Ferreira.

function ReplaceLtGt(const s: string): string;
var
  inPtr, outPtr: integer;
begin
  SetLength(Result, Length(s));
  inPtr := 1;
  outPtr := 1;
  while inPtr <= Length(s) do begin
    if (s[inPtr] = '&') and ((inPtr + 3) <= Length(s)) and
       (s[inPtr+1] in ['l', 'g']) and (s[inPtr+2] = 't') and
       (s[inPtr+3] = ';') then
    begin
      if s[inPtr+1] = 'l' then
        Result[outPtr] :=  '<'
      else
        Result[outPtr] := '>';
      Inc(inPtr, 3);
    end
    else begin
      Result[outPtr] := Result[inPtr];
      Inc(inPtr);
    end;
    Inc(outPtr);
  end;
  SetLength(Result, outPtr - 1);
end;
Sahara answered 26/9, 2008 at 14:55 Comment(2)
No, since string is not unicode but ansi. It would be if you use WideString instead of string. And it would be in Delphi 2009 or Delphi for .Net since both use unicode string by default.Oolite
Who is Jorge Ferreira? Is it possible that he has renamed himself to Smink now?Writhen
A
2

Systools (Turbopower, now open source) has a ReplaceStringAllL function that does all of them in a string.

Autobiography answered 26/9, 2008 at 15:20 Comment(0)
K
0

it's work like charm so fast trust it

    Function NewStringReplace(const S, OldPattern, NewPattern: string;  Flags: TReplaceFlags): string;
var
  OldPat,Srch: string; // Srch and Oldp can contain uppercase versions of S,OldPattern
  PatLength,NewPatLength,P,i,PatCount,PrevP: Integer;
  c,d: pchar;
begin
  PatLength:=Length(OldPattern);
  if PatLength=0 then begin
    Result:=S;
    exit;
  end;

  if rfIgnoreCase in Flags then begin
    Srch:=AnsiUpperCase(S);
    OldPat:=AnsiUpperCase(OldPattern);
  end else begin
    Srch:=S;
    OldPat:=OldPattern;
  end;

  PatLength:=Length(OldPat);
  if Length(NewPattern)=PatLength then begin
    //Result length will not change
    Result:=S;
    P:=1;
    repeat
      P:=PosEx(OldPat,Srch,P);
      if P>0 then begin
        for i:=1 to PatLength do
          Result[P+i-1]:=NewPattern[i];
        if not (rfReplaceAll in Flags) then exit;
        inc(P,PatLength);
      end;
    until p=0;
  end else begin
    //Different pattern length -> Result length will change
    //To avoid creating a lot of temporary strings, we count how many
    //replacements we're going to make.
    P:=1; PatCount:=0;
    repeat
      P:=PosEx(OldPat,Srch,P);
      if P>0 then begin
        inc(P,PatLength);
        inc(PatCount);
        if not (rfReplaceAll in Flags) then break;
      end;
    until p=0;
    if PatCount=0 then begin
      Result:=S;
      exit;
    end;
    NewPatLength:=Length(NewPattern);
    SetLength(Result,Length(S)+PatCount*(NewPatLength-PatLength));
    P:=1; PrevP:=0;
    c:=pchar(Result); d:=pchar(S);
    repeat
      P:=PosEx(OldPat,Srch,P);
      if P>0 then begin
        for i:=PrevP+1 to P-1 do begin
          c^:=d^;
          inc(c); inc(d);
        end;
        for i:=1 to NewPatLength do begin
          c^:=NewPattern[i];
          inc(c);
        end;
        if not (rfReplaceAll in Flags) then exit;
        inc(P,PatLength);
        inc(d,PatLength);
        PrevP:=P-1;
      end else begin
        for i:=PrevP+1 to Length(S) do begin
          c^:=d^;
          inc(c); inc(d);
        end;
      end;
    until p=0;
  end;
end;
Krasner answered 7/10, 2018 at 6:54 Comment(2)
Sure that this is faster than the built in StringReplace function? In particular I don't think the implementation of the rfIgnoreCase flag is particularly efficient. Did you do any timing?Straightlaced
yes i'm sure this is pretty faster than builtin function i have test in with 4MB String in my case case sensitive is not importantKrasner

© 2022 - 2024 — McMap. All rights reserved.