Appending UnicodeString to WideString in Delphi
Asked Answered
A

2

5

I'm curious about what happens with this piece of code in Delphi 2010:

function foo: WideString;
var 
   myUnicodeString: UnicodeString; 
begin
  for i:=1 to 1000 do
  begin
    myUnicodeString := ... something ...;

    result := result + myUnicodeString;  // This is where I'm interested
  end;
end;

How many string conversions are involved, and are any particularly bad performance-wise?

I know the function should just return a UnicodeString instead, but I've seen this anti-pattern in the VCL streaming code, and want to understand the process.

Ajani answered 15/8, 2013 at 13:41 Comment(4)
Did you try to look that up in the debugger's CPU window?Aplomb
@OnTheFly: It's actually part of a C++ Builder project, and for some reason BCB2010 doesn't like setting breakpoints in the VCL code... I'll try stepping through some more.Ajani
If you dont have Delphi to study the generated code for your tescase, I can post a disassembly, but I'm really unsure how to present it in the useful form...Aplomb
@Roddy: Check the linker option to use debug DCUs. (Project->Options->Delphi Compiler->Use Debug DCUs). This will allow you to step into the Delphi VCL and RTL from Builder.Serotherapy
C
10

To answer your question about what the code is actually doing, this statement:

result := result + myUnicodeString;

Does the following:

  1. calls System._UStrFromWStr() to convert Result to a temp UnicodeString

  2. calls System._UStrCat() to concatenate myUnicodeString onto the temp

  3. calls System._WStrFromUStr() to convert the temp to a WideString and assign it back to Result.

There is a System._WStrCat() function for concatenating a WideString onto a WideString (and System._UStrCat() for UnicodeString). If CodeGear/Embarcadero had been smarter about it, they could have implemented a System._WStrCat() overload that takes a UnicodeString as input and a WideString as output (and vice versa for concatenating a WideString onto a UnicodeString). That way, no temp UnicodeString conversions would be needed anymore. Both WideString and UnicodeString are encoded as UTF-16 (well mostly, but I won't get into that here), so concatenating them together is just a matter of a single allocation and move, just like when concatenating two UnicodeStrings or two WideStrings together.

Calenture answered 15/8, 2013 at 18:22 Comment(1)
Thanks Remy. That explains lots!Ajani
U
4

The performance is poor. There's no need for any encoding conversions since everything is UTF-16 encoded. However, WideString is a wrapper around the COM BSTR type which performs worse than native UnicodeString.

Naturally you should prefer to do all your work with the native types, either UnicodeString or TStringBuilder, and convert to WideString at the last possible moment.

That is generally a good policy. You don't want to use WideString internally since it's purely an interop type. So only convert to (and from) WideString at the interop boundary.

Unwritten answered 15/8, 2013 at 15:40 Comment(11)
Thanks. I'm particularly curious as to whether the string concat (wide := wide + uni) happens in the unicode or wide domain. If it's unicode, then two conversions are involved. (wide->uni, concat, uni->wide)Ajani
I don't know that specific detail off the top of my head. But it's trivially easy to work out with a debugger. Something I don't have right now. Either way, you don't want it. Do all your work with TStringBuilder and convert to WideString as late as possible. All those concatenations would be bad even with pure native strings. Heavy on the heap.Unwritten
It's not actually my code. It's in classes.pas in D2010 (CombineWideString). I had an issue where a form with a single 4MB string property (yes, but there's a good reason!) took 2.5 minutes(!) to load when using text DFMs, and under a second with binary.Ajani
You can't do anything about it then. You know the performance is rubbish. Just don't put the string in the dfm file.Unwritten
Actually, you can do something about it. Modify Classes.pas (the offending code is in the implementation section) and add the modified file to your project so it will override the RTL's default code. This only works if the project is not using runtime packages, though.Calenture
@Remy You can't do anything easily. The comment about text dfms makes it clear this is a design time issue.Unwritten
@DavidHeffernan: actually, it is not a design-time issue at all. He's using DFMs to stream custom data files at runtime. See this discussion. So my earlier comment still stands. Edit Classes.pas to fix the performance problem (it was natively fixed by Embarcadero in XE3) and add the edited file to the project.Calenture
@Remy Fair enough. I was lacking that background info. Obviously, with standard .dfm files associated with forms and data modules, then text .dfm is only design time.Unwritten
@DavidHeffernan, Thanks for your input - I should have linked to my forum post. IIRC, If you use text DFMs they're linked into the final exe as text RCDATA resources, so it's potentially an issue just with that.Ajani
@Ajani I don't think so. Text DFM setting is just for the source. The linked executable always has binary DFMs.Unwritten
You're right, as usual. XN Resource editor automatically displays them as though they were text, which caught me out...Ajani

© 2022 - 2024 — McMap. All rights reserved.