TRTTIContext multi-thread issue
Asked Answered
N

2

12

Everything I've read indicates that TRTTIContext is thread-safe.

However, TRTTIContext.FindType seems to fail (returns nil) occasionally when multithreading. Using a TCriticalSection around it fixes the issue. Note that I'm using XE6, and the issue doesn't seem to exist in XE. Edit: Seems to exist in all Delphi editions that have the new RTTI units.

I've worked up a test project you can use to see for yourself. Create a new VCL project, drop a TMemo and a TButton, replace unit1 with below, and assign the Form1.OnCreate, Form1.OnDestroy and Button1.OnClick events. The key CS is the GRTTIBlock in TTestThread.Execute. Currently disabled, I get between 3 and 5 failures when I run with 200 threads. Enabling the GRTTIBlock CS removes the failures.

unit Unit1;

interface

uses
  Winapi.Windows, Winapi.Messages, System.SysUtils, System.Variants, System.Classes, Vcl.Graphics,
  Vcl.Controls, Vcl.Forms, Vcl.Dialogs, Vcl.StdCtrls, SyncObjs, Contnrs, RTTI;

type
  TTestThread = class(TThread)
  private
    FFailed: Boolean;
    FRan: Boolean;
    FId: Integer;
  protected
    procedure Execute; override;
  public
    property Failed: Boolean read FFailed;
    property Ran: Boolean read FRan;
    property Id: Integer read FId write FId;
  end;

  TForm1 = class(TForm)
    Memo1: TMemo;
    Button1: TButton;
    procedure Button1Click(Sender: TObject);
    procedure FormCreate(Sender: TObject);
    procedure FormDestroy(Sender: TObject);
  private
    FThreadBlock: TCriticalSection;
    FMaxThreadCount: Integer;
    FThreadCount: Integer;
    FRanCount: Integer;
    FFailureCount: Integer;
    procedure Log(AStr: String);
    procedure ThreadFinished(Sender: TObject);
    procedure LaunchThreads;
  end;

var
  Form1: TForm1;

implementation

var
  GRTTIBlock: TCriticalSection;

{$R *.dfm}

{ TTestThread }

procedure TTestThread.Execute;
var
  ctx : TRTTIContext;
begin
//  GRTTIBlock.Acquire;
  try
    FFailed := not Assigned(ctx.FindType('Unit1.TForm1'));
    FRan := True;
  finally
//    GRTTIBlock.Release;
  end;
end;

{ TForm1 }

procedure TForm1.Button1Click(Sender: TObject);
begin
  Randomize;
  LaunchThreads;
  Log(Format('Threads: %d, Ran: %d, Failures: %d',
    [FMaxThreadCount, FRanCount, FFailureCount]));
end;

procedure TForm1.FormCreate(Sender: TObject);
begin
  FThreadBlock := TCriticalSection.Create;
end;

procedure TForm1.FormDestroy(Sender: TObject);
begin
  FThreadBlock.Free;
end;

procedure TForm1.Log(AStr: String);
begin
  Memo1.Lines.Add(AStr);
end;

procedure TForm1.ThreadFinished(Sender: TObject);
var
  tt : TTestThread;
begin
  tt := TTestThread(Sender);
  Log(Format('Thread %d finished', [tt.Id]));
  FThreadBlock.Acquire;
  try
    Dec(FThreadCount);
    if tt.Failed then
      Inc(FFailureCount);
    if tt.Ran then
      Inc(FRanCount);
  finally
    FThreadBlock.Release;
  end;
end;

procedure TForm1.LaunchThreads;
var
  c : Integer;
  ol : TObjectList;
  t : TTestThread;
begin
  FRanCount := 0;
  FFailureCount := 0;
  FMaxThreadCount := 200;
  ol := TObjectList.Create(False);
  try
    // get all the thread objects created and ready
    for c := 1 to FMaxThreadCount do
    begin
      t := TTestThread.Create(True);
      t.FreeOnTerminate := True;
      t.OnTerminate := ThreadFinished;
      t.Id := c;
      ol.Add(t);
    end;
    FThreadCount := FMaxThreadCount;
    // start them all up
    for c := 0 to ol.Count - 1 do
    begin
      TTestThread(ol[c]).Start;
      Log(Format('Thread %d started', [TTestThread(ol[c]).Id]));
    end;
    repeat
      Application.ProcessMessages;
      FThreadBlock.Acquire;
      try
        if FThreadCount <= 0 then
          Break;
      finally
        FThreadBlock.Release;
      end;
    until False;
  finally
    ol.Free;
  end;
end;

initialization
  GRTTIBlock := TCriticalSection.Create;

finalization
  GRTTIBlock.Free;

end.
Neuroticism answered 8/12, 2014 at 22:56 Comment(21)
Personally I think it is preferable to have a single global context variable. It seems that FThreadBlock serves no purpose. You would benefit from TList<T> here rather than TObjectList.Angeliaangelic
ctx is created from a pool. It has a reference counting mechanism and its creation and destruction is guarded by AtomicCmpExchange calls. You are heavily stressing the system in this test. BTW, FThreadBlock is always used in the main thread, and not needed.Hyozo
A global ctx gives no errors. Could be a weak spot (bug) in TRTTIContext. (I'm testing on XE7, upd 1).Hyozo
In code for TRTTIPool.GetPackageList there is an odd remark: ` // You're playing with fire if you try to race package addition and removal // in particular against RTTI operations. Please don't do that.` This is happening in all calls to FindType.Hyozo
I can only conclude that the use of a local TRTTIContext record variable in a thread is not safe. You should clean up the code as suggested by @David and file a QC report. Your workaround is ok but even simpler is to use a global (in scope) TRTTIContext record. I have seen a similar report that class function TJSONUnMarshal.ObjectInstance(Ctx: TRTTIContext) is not safe in a multithreaded environment, but I don't know if that is reported in QC.Hyozo
@LU RD / David ... fThreadBlock not needed ? Seriously ? VCL TThread gotcha #1: TThread.OnTerminate is performed in the context of the terminated thread (it is called from ThreadProc()). Thus fThreadBlock is absolutely needed and is doing something essential: ensuring thread safety for the thread count and failed/success counters.Chromosphere
@LU RD - Cheers for checking. In my production application, 3 concurrent ajax requests (to a TIdHTTPServer) was enough to occasionally get failures. 200 threads in my test project was to simply guarantee failure on a single run.Neuroticism
@Deltics, from the Classes unit: procedure TThread.DoTerminate; begin if Assigned(FOnTerminate) then Synchronize(CallOnTerminate); end;Neuroticism
Ooops. VCL TThread Gotcha #2: Gotcha #1 is wrong. My bad. :) Gotcha #3 of course is: Terminate() is Synchronised but the destructor is not (curse my failing memory). :) (Gotcha #0 being: Synchronize is fatally flawed in some versions and shouldn't really be relied on).Chromosphere
@Deltics: TThread.Terminate() has no synchronization at all. All it does is directly assigns a value to the TThread.Terminated property. TThread.DoTerminate() is called in the context of the worker thread, whereas TThread.OnTerminate is called in the context of the main thread (unless you override DoTerminate() to call OnTerminate directly). The thread destructor is called in whether thread context calls Free/Destroy() on the thread object, which is the worker thread itself when FreeOnTerminate is true, otherwise it is a different thread context.Oquinn
I guess a global context solves the problem right?Angeliaangelic
FYI, tested this on Delphi XE, no problems wwhatsoever, seems like a bug to me.Selfless
@Griffyn, yes this bug seems to be triggered often. A normal use case for RTTI is in serializing libraries, often used in threads. I don't like to think of Delphi in the context of playing with fire (comment not tied to FireMonkey, which is another beast). If you want me to file a QC, just let me know.Hyozo
@Selfless The error also occurs on XE. And a global context does not solve the problem because it's not related to the pool creation but to some race condition in the first package/type retrieval. So only a fake call to FindType would solve that. Also the comment is about racing module loading/unloading against RTTI querying. (we had that problem in our software where one thread was loading/unloading some dlls and another one was doing RTTI querying which failed because the GetPackageList method triggered some changes. It could also be a race condition inside TRealPackage.FindType.Cartridge
@StefanGlienke: I could not reproduce the problem on XE, do you mean the problem occurs whith runtime packages?Selfless
@Selfless Has nothing to do with runtime packages. I am talking about TRttiPackage which is the class for rtti inside one module (exe, dll, bpl). Remove the Log lines for thread started/finished and run it outside the debugger. This will make the error occur more frequently.Cartridge
@StefanGlienke, yes removing log calls makes the global context fail as well.Hyozo
This is bad, got to check some projects now :( @StefanGlienke is there a fix for this mess?Selfless
@Selfless Use a global context and force a call to MakeTypeLookupTable during initializationAngeliaangelic
This issue is still present in Delphi 10 SeattleBanuelos
Fixed in Delphi 10.3 RioHyozo
C
15

I think I found the problem. It is inside TRealPackage.FindType and MakeTypeLookupTable.

MakeTypeLookupTable checks for FNameToType being assigned. If not it runs DoMake. This one is protected with TMonitor and checks FNameToType being assigned again after entering.

So far so good. But then happens the mistake as inside DoMake FNameToType gets assigned causing other threads to happily pass MakeTypeLookupTable and return to FindType which then does return false in FNameToType.TryGetValue and returns nil.

Fix :

Since FNameToType is used outside of the locked DoMake as indicator that execution can continue it should not be assigned inside DoMake until it's properly filled up.

Edit: Reported as https://quality.embarcadero.com/browse/RSP-9815

Most recently (as of 2019-Nov) marked as Fixed in Delphi 10.3 Rio.

Cartridge answered 9/12, 2014 at 7:52 Comment(3)
You could make a note in your answer that this affects all Delphi versions since D2010 and that a global context variable also fails.Hyozo
@Stefan I think your bug report could be improved. I think it not implausible that the engineers will not understand it, and may not apply a fix. I suggest that you add a steps section that includes a reproduction of the fault. Code presented in the question here shows how to do that. What's more, I'd also encourage you to include a link to this question, and an excerpt from TRealPackage.MakeTypeLookupTable that makes it clear what the issue is. You describe the issue, but including code gives more power. Finally, I'm a bit scared about double checked locking on ARM.Angeliaangelic
It is fixed in 10.3 rio.Borghese
A
11

As Stefan explains, the problem is down to a faulty implementation of the double checked locking pattern. I'd like to expand on his answer and try to make it more clear what is wrong.

The erroneous code looks like this:

procedure TRealPackage.MakeTypeLookupTable;

  procedure DoMake;
  begin
    TMonitor.Enter(Flock);
    try
      if FNameToType <> nil then // presumes double-checked locking ok
        Exit;

      FNameToType := TDictionary<string,PTypeInfo>.Create;
      // .... code removed from snippet that populates FNameToType
    finally
      TMonitor.Exit(Flock);
    end;
  end;

begin
  if FNameToType <> nil then
    Exit;
  DoMake;
end;

The fault is that the code that populates the shared resource FNameToType is executed after FNameToType has been assigned. That code which populates the shared resource needs to execute before FNameToType is assigned.

Consider two threads, A and B. They are the first threads to call MakeTypeLookupTable. Thread A arrives first, finds that FNameToType is nil and calls DoMake. Thread A acquires the lock and reaches the code that assigns FNameToType. Now, before thread A manages to run any more code, thread B arrives in MakeTypeLookupTable. It tests FNameToType and finds that it is not nil, and so returns immediately. The calling code then makes use of FNameToType. However, FNameToType is not yet in a fit state to use. It has not been populated because thread A has not yet returned.

The most obvious fix from Embarcadero's side looks like this:

procedure DoMake;
var
  LNameToType: TDictionary<string,PTypeInfo>;
begin
  TMonitor.Enter(Flock);
  try
    if FNameToType <> nil then // presumes double-checked locking ok
      Exit;

    LNameToType := TDictionary<string,PTypeInfo>.Create;
    // .... populate LNameToType
    FNameToType := LNameToType;
  finally
    TMonitor.Exit(Flock);
  end;
end;

However, take note of the comment that says presumes double-checked locking ok. Well, double checked locking is fine when the machine has a strong enough memory model. So it's all good on x86 and x64. But ARM has a relatively weak memory model. So I have strong doubts as to whether or not this fix is enough on ARM. Indeed I do wonder where else in the RTL that Embarcadero have used double checked locking.

If TRealPackage had been declared in the interface section of the code then it would be easy enough to patch TRealPackage.MakeTypeLookupTable to apply the change above. However, that is not the case. So in order to apply a work around I suggest the following:

  1. Use a single global RTTI context for all your RTTI code.
  2. In the initialization stage of your program, make a call on that context that in turn forces a call to TRealPackage.MakeTypeLookupTable. Because initialization happens single threaded you avoid the race condition.

Declare the global context like this, say:

var
  ctx: TRttiContext;

And force the call to TRealPackage.MakeTypeLookupTable like this:

ctx.FindType('');

So long as all your RTTI code goes via this single shared context then you cannot fall foul of this race.

Angeliaangelic answered 9/12, 2014 at 9:15 Comment(4)
Yes that fixes the problem! Going to commit some patches now ;)Selfless
What about local declared context variables in the RTL? Seems it affects Rest/Soap/JSON.Hyozo
@LURD Sure. For RTTI context that is out of your control there's not a lot you can do. The designers really screwed this up. They should have enforced a single shared global instance of the context. Why they imagines that we'd all need our own instances of something that is fixed at compile time is beyond me.Angeliaangelic
There is only one shared global instance of the TRttiPool which is what we actually need. So as long as you early enough cause it to load you are fine.Cartridge

© 2022 - 2025 — McMap. All rights reserved.