Why does the following code using IOmniThreadPool cause access violations?
Asked Answered
R

2

10

In our Delphi XE4 application we are using an OmniThreadPool with MaxExecuting=4 to improve the efficiency of a certain calculation. Unfortunately we are having trouble with intermittent access violations (see for example the following MadExcept bug report http://ec2-72-44-42-247.compute-1.amazonaws.com/BugReport.txt). I was able to construct the following example which demonstrates the problem. After running the following console application, an access violation in System.SyncObjs.TCriticalSection.Acquire usually occurs within a minute or so. Can anybody tell me what I am doing wrong in the following code, or show me another way of achieving the desired result?

program OmniPoolCrashTest;

{$APPTYPE CONSOLE}

uses
  Winapi.Windows, System.SysUtils,
  DSiWin32, GpLists,
  OtlSync, OtlThreadPool, OtlTaskControl, OtlComm, OtlTask;

const
  cTimeToWaitForException = 10 * 60 * 1000;  // program exits if no exception after 10 minutes
  MSG_CALLEE_FINISHED = 113; // our custom Omni message ID
  cMaxAllowedParallelCallees = 4; // enforced via thread pool
  cCalleeDuration = 10; // 10 miliseconds
  cCallerRepetitionInterval = 200; // 200 milliseconds
  cDefaultNumberOfCallers = 10; // 10 callers each issuing 1 call every 200 milliseconds

var
  gv_OmniThreadPool : IOmniThreadPool;

procedure OmniTaskProcedure_Callee(const task: IOmniTask);
begin
  Sleep(cCalleeDuration);
  task.Comm.Send(MSG_CALLEE_FINISHED);
end;

procedure PerformThreadPoolTest();
var
  OmniTaskControl : IOmniTaskControl;
begin
  OmniTaskControl := CreateTask(OmniTaskProcedure_Callee).Schedule(gv_OmniThreadPool);
  WaitForSingleObject(OmniTaskControl.Comm.NewMessageEvent, INFINITE);
end;

procedure OmniTaskProcedure_Caller(const task: IOmniTask);
begin
  while not task.Terminated do begin
    PerformThreadPoolTest();
    Sleep(cCallerRepetitionInterval);
  end;
end;

var
  CallerTasks : TGpInterfaceList<IOmniTaskControl>;
  i : integer;
begin
  gv_OmniThreadPool := CreateThreadPool('CalleeThreadPool');
  gv_OmniThreadPool.MaxExecuting := cMaxAllowedParallelCallees;
  CallerTasks := TGpInterfaceList<IOmniTaskControl>.Create();
  for i := 1 to StrToIntDef(ParamStr(1), cDefaultNumberOfCallers) do begin
    CallerTasks.Add( CreateTask(OmniTaskProcedure_Caller).Run() );
  end;
  Sleep(cTimeToWaitForException);
  for i := 0 to CallerTasks.Count-1 do begin
    CallerTasks[i].Terminate();
  end;
  CallerTasks.Free();
end.
Radiogram answered 28/5, 2014 at 9:1 Comment(3)
This looks like the same question as the other one. You really should only ask one. I suggest that you delete one of them. Can you simplify this further. Ideally there should be a single .dpr file with no UI. A console app.Utrecht
Thanks for the feedback! I have now deleted the other question and also simplified the example code.Radiogram
Excellent work. This really is a super SSCCE now. I'd upvote again if I was allowed to!!Utrecht
F
7

You have here an example of hard-to-find Task controller needs an owner problem. What happens is that the task controller sometimes gets destroyed before the task itself and that causes the task to access memory containing random data.

Problematic scenario goes like this ([T] marks task, [C] marks task controller):

  • [T] sends the message
  • [C] receives the message and exits
  • [C] is destroyed
  • new task [T1] and controller [C1] are created
  • [T] tries to exit; during that it accesses the shared memory area which was managed by [C] but was then destroyed and overwritten by the data belonging to [C1] or [T1]

In the Graymatter's workaround, OnTerminated creates an implicit owner for the task inside the OmniThreadLibrary which "solves" the problem.

The correct way to wait on the task to complete is to call taskControler.WaitFor.

procedure OmniTaskProcedure_Callee(const task: IOmniTask);
begin
  Sleep(cCalleeDuration);
end;

procedure PerformThreadPoolTest();
var
  OmniTaskControl : IOmniTaskControl;
begin
  OmniTaskControl := CreateTask(OmniTaskProcedure_Callee).Schedule(gv_OmniThreadPool);
  OmniTaskControl.WaitFor(INFINITE);
end;

I will look into replacing shared memory record with reference-counted solution which would prevent such problems (or at least make them easier to find).

Furfuran answered 29/5, 2014 at 10:39 Comment(2)
Many thanks for the detailed response! Actually I think part of the reason I implemented this the way I did was because I wasn't sure what would happen if I called TOmniTaskControl.Comm.Receive after the task was already finished (in the original application MSG_CALLEE_FINISHED has data attached to it). But it seems that calling TOmniTaskControl.Comm.Receive after TOmniTaskControl.WaitFor is completely unproblematic after all :) I have now implemented it this way in our application. Thanks again for the help.Radiogram
Communication channel stays alive as long as the task controller is alive and messages are waiting for you to read them.Furfuran
R
5

It looks like your termination message is causing the problem. Removing the message and the WaitForSingleObject stopped the AV. In my tests just adding a .OnTerminated(procedure begin end) before the .Schedule also did enough to change the flow and to stop the error. So the code in that case would look like this:

procedure PerformThreadPoolTest();
var
  OmniTaskControl : IOmniTaskControl;
begin
  OmniTaskControl := CreateTask(OmniTaskProcedure_Callee).OnTerminated(procedure begin end).Schedule(gv_OmniThreadPool);
  WaitForSingleObject(OmniTaskControl.Comm.NewMessageEvent, INFINITE);
end;

It looks to me like this might be the problem. otSharedInfo_ref has a property called MonitorLock. This is used to block changes to otSharedInfo_ref. If for some reason otSharedInfo_ref is freed while the acquire is waiting then you are likely to get some very weird behavior

The code as it stands looks like this:

procedure TOmniTask.InternalExecute(calledFromTerminate: boolean);
begin
  ...
    // with internal monitoring this will not be processed if the task controller owner is also shutting down
    sync := nil; // to remove the warning in the 'finally' clause below
    otSharedInfo_ref.MonitorLock.Acquire;
    try
      sync := otSharedInfo_ref.MonitorLock.SyncObj;
      if assigned(otSharedInfo_ref.Monitor) then
        otSharedInfo_ref.Monitor.Send(COmniTaskMsg_Terminated,
          integer(Int64Rec(UniqueID).Lo), integer(Int64Rec(UniqueID).Hi));
      otSharedInfo_ref := nil;
    finally sync.Release; end;
  ...
end; { TOmniTask.InternalExecute }

If otSharedInfo_ref.MonitorLock.Acquire is busy waiting and the object behind otSharedInfo_ref is freed then we end up in a very nasty place. Changing the code to this stopped the AV that was happening in InternalExecute:

procedure TOmniTask.InternalExecute(calledFromTerminate: boolean);
var
  ...
  monitorLock: TOmniCS;
  ...
begin
  ...
    // with internal monitoring this will not be processed if the task controller owner is also shutting down
    sync := nil; // to remove the warning in the 'finally' clause below
    monitorLock := otSharedInfo_ref.MonitorLock;
    monitorLock.Acquire;
    try
      sync := monitorLock.SyncObj;
      if assigned(otSharedInfo_ref) and assigned(otSharedInfo_ref.Monitor) then
        otSharedInfo_ref.Monitor.Send(COmniTaskMsg_Terminated,
          integer(Int64Rec(UniqueID).Lo), integer(Int64Rec(UniqueID).Hi));
      otSharedInfo_ref := nil;
    finally sync.Release; end;
  ...
end; { TOmniTask.InternalExecute }

I did start getting AV's in the OmniTaskProcedure_Callee method then on the "task.Comm.Send(MSG_CALLEE_FINISHED)" line so it's still not fixed but this should help others/Primoz to further identify what is going on. In the new error, task.Comm is often unassigned.

Runofthemill answered 28/5, 2014 at 22:34 Comment(1)
Thank you very much for the detailed response! Based on the testing I have done so far, your suggestion to add .OnTerminated(procedure begin end) seems to solve the problem in our application.Radiogram

© 2022 - 2024 — McMap. All rights reserved.