Why does this string have a reference count of 4? (Delphi 2007)
Asked Answered
T

1

16

This is a very Delphi specific question (maybe even Delphi 2007 specific). I am currently writing a simple StringPool class for interning strings. As a good little coder I also added unit tests and found something that baffled me.

This is the code for interning:

function TStringPool.Intern(const _s: string): string;
var
  Idx: Integer;
begin
  if FList.Find(_s, Idx) then
    Result := FList[Idx]
  else begin
    Result := _s;
    if FMakeStringsUnique then
      UniqueString(Result);
    FList.Add(Result);
  end;
end;

Nothing really fancy: FList is a TStringList that is sorted, so all the code does is looking up the string in the list and if it is already there it returns the existing string. If it is not yet in the list, it will first call UniqueString to ensure a reference count of 1 and then add it to the list. (I checked the reference count of Result and it is 3 after 'hallo' has been added twice, as expected.)

Now to the testing code:

procedure TestStringPool.TestUnique;
var
  s1: string;
  s2: string;
begin
  s1 := FPool.Intern('hallo');
  CheckEquals(2, GetStringReferenceCount(s1));
  s2 := s1;
  CheckEquals(3, GetStringReferenceCount(s1));
  CheckEquals(3, GetStringReferenceCount(s2));
  UniqueString(s2);
  CheckEquals(1, GetStringReferenceCount(s2));
  s2 := FPool.Intern(s2);
  CheckEquals(Integer(Pointer(s1)), Integer(Pointer(s2)));
  CheckEquals(3, GetStringReferenceCount(s2));
end;

This adds the string 'hallo' to the string pool twice and checks the string's reference count and also that s1 and s2 indeed point to the same string descriptor.

Every CheckEquals works as expected but the last. It fails with the error "expected: <3> but was: <4>".

So, why is the reference count 4 here? I would have expected 3:

  • s1
  • s2
  • and another one in the StringList

This is Delphi 2007 and the strings are therefore AnsiStrings.

Oh yes, the function StringReferenceCount is implemented as:

function GetStringReferenceCount(const _s: AnsiString): integer;
var
  ptr: PLongWord;
begin
  ptr := Pointer(_s);
  if ptr = nil then begin
    // special case: Empty strings are represented by NIL pointers
    Result := MaxInt;
  end else begin
    // The string descriptor contains the following two longwords:
    // Offset -1: Length
    // Offset -2: Reference count
    Dec(Ptr, 2);
    Result := ptr^;
  end;
end;

In the debugger the same can be evaluated as:

plongword(integer(pointer(s2))-8)^

Just to add to the answer from Serg (which seems to be 100% correct):

If I replace

s2 := FPool.Intern(s2);

with

s3 := FPool.Intern(s2);
s2 := '';

and then check the reference count of s3 (and s1) it is 3 as expected. It's just because of assigning the result of FPool.Intern(s2) to s2 again (s2 is both, a parameter and the destination for the function result) that causes this phenomenon. Delphi introduces a hidden string variable to assign the result to.

Also, if I change the function to a procedure:

procedure TStringPool.Intern(var _s: string);

the reference count is 3 as expected because no hidden variable is required.


In case anybody is interested in this TStringPool implementation: It's open source under the MPL and available as part of dzlib, which in turn is part of dzchart:

https://sourceforge.net/p/dzlib/code/HEAD/tree/dzlib/trunk/src/u_dzStringPool.pas

But as said above: It's not exactly rocket science. ;-)

Turgid answered 26/6, 2011 at 9:51 Comment(13)
Could you check the ref count for S1 at the end of the TestUnique as well? I am curious what its ref count is at that point.Bodily
surely you can use debug dcusOverload
+ for taking no nonsense from guessers.Gangrene
@david: I had already tried debug dcus but that didn't give me any insight.Turgid
@Marjan: Ref count for S1 is also 4. (as it should, because both variables point to the same string descriptor: CheckEquals(Integer(Pointer(s1)), Integer(Pointer(s2)));Turgid
Why are you interested in making the strings unique?Overload
@David Probably the reason is to guarantee that the strings are in heap, not in read-only memory.Mention
@Serg But the compiler would handle that for you if attempted to modify one of the entries in the list. I can't help feeling that OP is making much more of this than is needed.Overload
Removed the extraneous "only answer if you are ____" part from the body. If an answer is wrong, downvote it.Valene
@sixlettervariables: I don't quite understand why you removed this. I am not interested in guesses by people who don't know the innards of string reference counting. But since I already got my answer, I don't really care.Turgid
@David: I assume you are referring to the if FMakeStringsUnique then UniqueString(Result); part of the source. This is just for some internal reasons in my program. It was possible that some function changes the original string in place after it was added to the StringPool. That must not affect the strings in the pool. This is no longer allowed in my changed version because that works with a var parameter. I just forgot to remove that option from the class.Turgid
@dummzeuch: this is SO, anybody can answer, you don't get to pick and choose. If the answer is not correct, downvote it.Valene
@sixlettervariables: I wanted to ask people to answer only if they know what they are talking about. That does not prevent anybody from answering anyway, so what's the problem?Turgid
M
14

Test this:

function RefCount(const _s: AnsiString): integer;
var
  ptr: PLongWord;
begin
  ptr := Pointer(_s);
  Dec(Ptr, 2);
  Result := ptr^;
end;

function Add(const S: string): string;
begin
  Result:= S;
end;

procedure TForm9.Button1Click(Sender: TObject);
var
  s1: string;
  s2: string;

begin
  s1:= 'Hello';
  UniqueString(s1);
  s2:= s1;
  ShowMessage(Format('%d', [RefCount(s1)]));   // 2
  s2:= Add(s1);
  ShowMessage(Format('%d', [RefCount(s1)]));   // 2
  s1:= Add(s1);
  ShowMessage(Format('%d', [RefCount(s1)]));   // 3
end;

If you write s1:= Add(s1) the compiler creates a hidden local string variable, and this variable is responsible for incrementing ref count. You should not bother about it.

Mention answered 26/6, 2011 at 12:34 Comment(2)
So this is basically an artifact of Delphi function results being passed as VAR parameters? I knew of a similar effect with interfaces (which is sometimes used to save on try..finally constructs) but somehow didn't think it also applied to strings.Turgid
@Turgid - I think that is true. The compiler just can't pass s1 as var in s1:= Add(s1), so it creates a hidden variable, passes it as var and assigns it to s1 (incrementing ref count)Mention

© 2022 - 2024 — McMap. All rights reserved.