When and Why Should I Use TStringBuilder?
Asked Answered
P

6

15

I converted my program from Delphi 4 to Delphi 2009 a year ago, mainly to make the jump to Unicode, but also to gain the benefits of all those years of Delphi improvements.

My code, of course, is therefore all legacy code. It uses short strings that have now conveniently all become long Unicode strings, and I've changed all the old ANSI functions to the new equivalent.

But with Delphi 2009, they introduced the TStringBuilder class, presumably modelled after the StringBuilder class of .NET.

My program does a lot of string handling and manipulation, and can load hundreds of megabytes of large strings into memory at once to work with.

I don't know a lot about Delphi's implementation of TStringBuilder, but I heard that some of its operations are faster than using the default string operations.

My question is whether or not it is worthwhile for me to go through the effort and convert my standard strings to use the TStringBuilder class. What would I gain and lose from doing that?


Thank you for your answers and leading me to my conclusion, which is not to bother unless .NET compatibility is required.

On his blog on Delphi 2009 String Performance, Jolyon Smith states:

But it looks to me as if TStringBuilder is there primarily as a .NET compatibility fixture, rather than to provide any real benefit to developers of Win32 applications, with the possible exception of developers wishing or needing to single-source a Win32/.NET codebase where string handling performance isn’t a concern.

Publish answered 18/10, 2009 at 19:13 Comment(5)
Just curious if you noticed a change in string performance when you went from D4 shortstrings to D2009 unicode strings?Chesser
I didn't do timings directly on strings, but my upgrade resulted in a 25% code performance improvement - probably due to FastMM and other optimizations built into the new version. External ANSI files had to be Encoded to Unicode which takes double the space and this adds a major overhead to the program for very large files, reversing the performance improvement seen for small files. Blocking into very large buffers reduced the burdon. Overall, I feel my program is probably about as fast as before, but with the great benefit of Unicode.Publish
@lkessler: It's sad you still repeat that misinformation about your "having to encode external ANSI files to Unicode". If you had switched them to UTF-8 (which is a valid Unicode encoding) you wouldn't have doubled your file size, and you would have lost nothing. On the contrary, unless on a fast SSD the decrease in I/O would probably have been much more important than the increase in CPU cycles for string recoding, giving you a nice performance boost.Diggins
Lachlan: Also see Jan Goyvaert's article: "Speed Benefits of Using The Native Win32 String Type" micro-isv.asia/2008/09/…Publish
Mghie: As I know you know, UTF8 is not native to Windows. Conversions need to be done each time you process a UTF8 string. So the tradeoff is space versus processing speed. But now that you mention it, I will when I get back to my input handling, try the alternative of loading into memory as UTF8 and compare the overall processing speed. If the UTF8 is not too much of an overhead processing-wise, then I'll leave it that way. And I do thank you for pointing this out again, because you may have helped me greatly.Publish
C
13

To the best of my knowledge TStringBuilder was introduced just for some parity with .NET and Java, it seems to be more of a tick the box type feature than any major advance.

Consensus seems to be that TStringBuilder is faster in some operations but slower in others.

Your program sounds like an interesting one to do a before/after TStringBuilder comparison with but I wouldn't do it other than as an academic exercise.

Chesser answered 18/10, 2009 at 19:28 Comment(0)
H
11

Basically, I use these idioms for building strings. The most important differences are:

For complex build patterns, the first make my code a lot cleaner, the second only if I add lines and often includes many of Format calls.

The third makes my code cleaner when format patterns are important.

I use the last one only when the expression is very simple.

A few more differences between the first two idioms:

  • TStringBuilder has many overloads for Append, and also has AppendLine (with only two overloads) if you want to add lines like TStringList.Add can
  • TStringBuilder reallocates the underlying buffer with an over capacity scheme, which means that with large buffers and frequent appends, it can be a lot faster than TStringList
  • To get the TStringBuilder content, you have to call the ToString method which can slow things down.

So: speed is not the most important matter to choose your string appending idiom. Readable code is.

Hawn answered 19/10, 2009 at 9:29 Comment(0)
C
11

I tried to improve an old routine that was parsing a text file (1.5GB). The routine was pretty dumb, and it was building a string like this: s:= s+ buff[i];

So, I thought that TStringBuilder will add significant speed improvements. It turned out that it was actually 114% slower.


So, I built mu own StringBuilder which is 184.82 times (yes 184!!!!!!) faster than the classic s:= s+ chr (experiment on a 4MB string) and even faster than TStringBuilder.

Tests:

Classic s:= s + c
Time: 8502 ms

procedure TfrmTester.btnClassicClick(Sender: TObject);
VAR
   s: string;
   FileBody: string;
   c: Cardinal;
   i: Integer;
begin
 FileBody:= ReadFile(File4MB);
 c:= GetTickCount;
 for i:= 1 to Length(FileBody) DO
  s:= s+ FileBody[i];
 Log.Lines.Add('Time: '+ IntToStr(GetTickCount-c) + 'ms');     // 8502 ms
end;

Prebuffered

Time:  
     BuffSize= 10000;       // 10k  buffer = 406ms
     BuffSize= 100000;      // 100k buffer = 140ms
     BuffSize= 1000000;     // 1M   buffer = 46ms

Code:

procedure TfrmTester.btnBufferedClick(Sender: TObject);
VAR
   s: string;
   FileBody: string;
   c: Cardinal;
   CurBuffLen, marker, i: Integer;
begin
 FileBody:= ReadFile(File4MB);
 c:= GetTickCount;

 marker:= 1;
 CurBuffLen:= 0;
 for i:= 1 to Length(FileBody) DO
  begin
   if i > CurBuffLen then
    begin
     SetLength(s, CurBuffLen+ BuffSize);
     CurBuffLen:= Length(s)
    end;
   s[marker]:= FileBody[i];
   Inc(marker);
  end;

 SetLength(s, marker-1); { Cut down the prealocated buffer that we haven't used }  
 Log.Lines.Add('Time: '+ IntToStr(GetTickCount-c) + 'ms');
 if s <> FileBody
 then Log.Lines.Add('FAILED!');
end;

Prebuffered, as class

Time:    
 BuffSize= 10000;       // 10k  buffer = 437ms       
 BuffSize= 100000;      // 100k buffer = 187ms        
 BuffSize= 1000000;     // 1M buffer = 78ms     

Code:

procedure TfrmTester.btnBuffClassClick(Sender: TObject);
VAR
   StringBuff: TCStringBuff;
   s: string;
   FileBody: string;
   c: Cardinal;
   i: Integer;
begin
 FileBody:= ReadFile(File4MB);
 c:= GetTickCount;

 StringBuff:= TCStringBuff.Create(BuffSize);
 TRY
   for i:= 1 to Length(FileBody) DO
    StringBuff.AddChar(filebody[i]);
   s:= StringBuff.GetResult;
 FINALLY
  FreeAndNil(StringBuff);
 END;

 Log.Lines.Add('Time: '+ IntToStr(GetTickCount-c) + 'ms');
 if s <> FileBody
 then Log.Lines.Add('FAILED!');
end;

And this is the class:

{ TCStringBuff }

constructor TCStringBuff.Create(aBuffSize: Integer= 10000);
begin
 BuffSize:= aBuffSize;
 marker:= 1;
 CurBuffLen:= 0;
 inp:= 1;
end;

function TCStringBuff.GetResult: string;
begin
 SetLength(s, marker-1);                    { Cut down the prealocated buffer that we haven't used }
 Result:= s;
 s:= '';         { Free memory }
end;

procedure TCStringBuff.AddChar(Ch: Char);
begin
 if inp > CurBuffLen then
  begin
   SetLength(s, CurBuffLen+ BuffSize);
   CurBuffLen:= Length(s)
  end;

 s[marker]:= Ch;
 Inc(marker);
 Inc(inp);
end;

Conclusion:

Stop using s:= s + c if you have large (over 10K) strings. It might be true even if you have small strings but you do it often (for example, you have a function that is doing some string processing on a small string, but you call it often).

_

PS: You may also want to see this: https://www.delphitools.info/2013/10/30/efficient-string-building-in-delphi/2/

Chelyabinsk answered 7/8, 2015 at 12:56 Comment(1)
Thanks for adding this late answer. That's an excellent article you link to.Publish
O
8

TStringBuilder was introduced solely to provide a source code compatible mechanism for applications to perform string handling in Delphi and Delphi.NET. You sacrifice some speed in Delphi for some potentially significant benefits in Delphi.NET

The StringBuilder concept in .NET addresses performance issues with the string implementation on that platform, issues that the Delphi (native code) platform simply does not have.

If you are not writing code that needs to be compiled for both native code and Delphi.NET then there is simply no reason to use TStringBuilder.

Oriole answered 18/10, 2009 at 19:54 Comment(4)
It's not true that it was introduced solely for source code compatibility. That was part of it, but another strong reason is that it is a powerful class to use, and because some people prefer it's ability to do the fluent coding pattern. Bottom line -- use it if you want, don't use it if you don't want.Ichnite
Genuinely curious: "Powerful" how? Why? As for fluent? Yes well, as you say, use it if you want, don't use it if you don't want (or if you like to hold on to your sanity when debugging <grin>).Oriole
@Oriole - It may be true that TStringBuilder was introduced solely for .NET compatibility, but I'd guess that that was about 95% of the reason ;-) what are the odds that TStringBuilder would have been introduced if the Delphi.NET requirement had never existed?Madalynmadam
@Madalynmadam Mike Lischke's VirtualTrees had introduced a TStringBuilder years before Embarcadero did. It was created speed up operations with unicode strings. The only support came from Window's BSTRs, which Delphi exposed as WideString. Windows provides no way to realloc a BSTR (BSTRs are not referenced counted; SysReallocString creates a second string and does a copy). In this case, the Delphi TStringBuilder was a necessity. After that, the ability to set a Capacity on a string is useful if you're going to be added a lot of stuff. Plus, it is a nice syntax.Abomasum
T
7

According to Marco Cantu not for speed, but you might get cleaner code and better code compatibility with .Net. Here (and some corrections here) another speed test with TStringBuilder not being faster.

Tiffa answered 18/10, 2009 at 19:28 Comment(4)
Thanks for the links. They helped. But I can't agree with you that StringBuilder gives cleaner code. I definitely like: s := s + s2; better than: SB.Append(s2);Publish
From the Marco Cantu article, I believe he meant when adding various data types. Still not a huge difference, of course.Rutger
This is Cantu's string concat test: "for i := 1 to 15 do s := s + 'xxx';" That's not much of a test. The for loop should be bigger. I would bet that TStringBuilder would win in that case. I can't test it unfortunately because I don't have D2009. Generally speaking that pattern is slow in most languages because the strings have to be continually reallocated and copied.Alberic
But I could be wrong. I just did a quick test with big for loops, and it didn't crawl to a halt like I was expecting. I guess Delphi must modify the strings in place and allocate extra space each time. For languages with immutable strings, the loop would get slower and slower.Alberic
D
6

TStringBuilder is basically just a me-too feature, like LachlanG said. It's needed in .NET because CLR strings are immutable, but Delphi doesn't have that problem so it doesn't really require a string builder as a workaround.

Drabeck answered 18/10, 2009 at 19:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.