DCOM: How to close connection in server on client crash?

Asked 25/1, 2011 at 14:39 Answered 29/5, 2011 at 20:28

I have a rather old project: DCOM client and server, both in C++\ATL, only Windows platform. Everything works fine: local and remote clients connect to server and work simultaneously without any problem.

But when remote client crashes or being killed by Task Manager or by "taskkill" command or power switch off - I have a problem. My server do not know anything about client crash and tries to send new events to all clients (also to already crashed). As result I have pause (server can not send data to already crashed client) and it's duration is proportional to the numbers of crashed remote clients. After 5 crashed clients pauses are so long that it is equal to completely server stop.

I know about DCOM "ping" mechanism (DCOM should disconnect clients that does not respond to "every 2 minutes ping" after 6 minutes of silence). And really, after 6 minutes of hang I have small period of normal work but then server is coming back to "paused" state.

What can I do with all of this? How to make DCOM "ping" works fine? If I will implement my own "ping" code is it possible to disconnect old DCOM clients connection manually? How to do it?

Strove answered 25/1, 2011 at 14:39 Comment(7)

Have you considered sending the events from thread pool threads, to mitigate blocking to an extent? – Bojorquez 25/1, 2011 at 14:54

@bdonlan: That could be a solution, however this will complicate the server significantly - it will have to take care of those extra threads lifetime. – Sikkim 25/1, 2011 at 14:59

Not really - you can just use the built-in win32 thread pool. If you're using a MTA already, it's pretty trivial to hit QueueUserWorkItem. If you're in a STA you'd have to marshal the handle to the remote interface into the MTA, but that's still not too difficult to do using CoMarshalInterThreadInterfaceInStream etc – Bojorquez 25/1, 2011 at 15:1

In fact, perhaps I'll write this up as a proper answer :) – Bojorquez 25/1, 2011 at 15:1

@bdonlan: You definitely should write it as an answer. – Sikkim 25/1, 2011 at 15:4

@sharptooth, and now I have :) – Bojorquez 25/1, 2011 at 15:17

This is the fundamental Achilles heel of DCOM, middleware in general. Abstracting the network away doesn't work well in practice. Rearchitecting to a loosely coupled system with a single point of failure is painful and expensive. Look at message queuing. – Belay 25/1, 2011 at 15:23

I'm not sure about the DCOM ping system, but one option for you would be to simply farm off the notifications to a separate thread pool. This will help mitigate the effect of having a small number of blocking clients - you'll start having problems when there are too many though, of course.

The easy way to do this is to use QueueUserWorkItem - this will invoke the passed callback on the application's system thread pool. Assuming you're using a MTA, this is all you need to do:

static InfoStruct {
    IRemoteHost *pRemote;
    BSTR someData;
};

static DWORD WINAPI InvokeClientAsync(LPVOID lpInfo) {
  CoInitializeEx(COINIT_MULTITHREADED);

  InfoStruct *is = (InfoStruct *)lpInfo;
  is->pRemote->notify(someData);
  is->pRemote->Release();
  SysFreeString(is->someData);
  delete is;

  CoUninitialize();
  return 0;
}

void InvokeClient(IRemoteHost *pRemote, BSTR someData) {

  InfoStruct *is = new InfoStruct;
  is->pRemote = pRemote;
  pRemote->AddRef();

  is->someData = SysAllocString(someData);
  QueueUserWorkItem(InvokeClientAsync, (LPVOID)is, WT_EXECUTELONGFUNCTION);
}

If your main thread is in a STA, this is only slightly more complex; you just have to use CoMarshalInterThreadInterfaceInStream and CoGetInterfaceAndReleaseStream to pass the interface pointer between apartments:

static InfoStruct {
    IStream *pMarshalledRemote;
    BSTR someData;
};

static DWORD WINAPI InvokeClientAsync(LPVOID lpInfo) {
  CoInitializeEx(COINIT_MULTITHREADED); // can be STA as well

  InfoStruct *is = (InfoStruct *)lpInfo;
  IRemoteHost *pRemote;
  CoGetInterfaceAndReleaseStream(is->pMarshalledRemote, __uuidof(IRemoteHost), (LPVOID *)&pRemote);

  pRemote->notify(someData);
  pRemote->Release();
  SysFreeString(is->someData);
  delete is;

  CoUninitialize();

  return 0;
}

void InvokeClient(IRemoteHost *pRemote, BSTR someData) {
  InfoStruct *is = new InfoStruct;
  CoMarshalInterThreadInterfaceInStream(__uuidof(IRemoteHost), pRemote, &is->pMarshalledRemote);

  is->someData = SysAllocString(someData);
  QueueUserWorkItem(InvokeClientAsync, (LPVOID)is, WT_EXECUTELONGFUNCTION);
}

Note that error checking has been elided for clarity - you will of course want to error check all calls - in particular, you want to be checking for RPC_S_SERVER_UNAVAILABLE and other such network errors, and remove the offending clients.

Some more sophisticated variations you may want to consider include ensuring only one request is in-flight per client at a time (thus further reducing the impact of a stuck client) and caching the marshalled interface pointer in the MTA (if your main thread is a STA) - since I believe CoMarshalInterThreadInterfaceInStream may perform network requests, you'd ideally want to take care of it ahead of time when you know the client is connected, rather than risking blocking on your main thread.

Bojorquez answered 25/1, 2011 at 15:15 Comment(0)

One solution would be to eliminate events - make clients query the server for whether there's anything of interest.

Sikkim answered 25/1, 2011 at 14:48 Comment(6)

Really, it is impossible. I have hundreds of events and it is very important to notify client about event as fast as it is possible. If client will ask server for new events even every 1 second - it is not enough fast. If client will ask server 100 times per second - than 5 clients will totally hang server, network and CPU. – Strove 26/1, 2011 at 9:21

@Ezh: Sounds reasonable, but have you actually tested it? How fast does a million of "do nothing" calls execute? – Sikkim 26/1, 2011 at 9:24

Yes, I've tested. Remote DCOM call it is not "do nothing". It is network communication, Windows security, marshaling etc. Some milliseconds plus network and CPU load. – Strove 26/1, 2011 at 15:6

@Ezh: I meant "do notning" in the server-side implementation so that the overhead on its own can be measured. So do you say that one million calls takes several milliseconds? – Sikkim 26/1, 2011 at 15:9

In this article (technet.microsoft.com/en-us/library/cc722925.aspx) there is Microsoft's benchmark of DCOM methods calls. Shortly - 2-3 ms for each remote call. – Strove 27/1, 2011 at 8:21

@Ezh: They quote some 15 years old hardware. I guess you could spend half an hour to run the test again - maybe it'll be much faster now. – Sikkim 27/1, 2011 at 8:25

Use DCOM to establish a notification named pipe. Disconnection is handled better with pipes. Listener responds (almost) instantly to messages. e.g. Server->Client (what is your pipe's name?). Client->Server responds with name which includes machine. Client creates named pipe and listens. Server opens pipe either immediately or when needed.

Emulation answered 27/2, 2011 at 21:45 Comment(0)

You can implement your own ping mechanism so your clients will call server's ping method from time to time. You already maintain some sort of container for your clients on the server side. In that map mark each client with a timestamp of last ping. Then check if the client is alive before sending events to that client. You can customize a strategy of when to stop sending events, maybe based on time or number of missed pings or type of event or some other factors. You probably don't need to worry about deleting clients - that can wait till DCOM realizes that a particular client is dead. This scheme may not eliminate the issue completely since a client may die just before an event needs to be sent, but you will have complete control over how many such clients may exist by tweaking the ping period. The smaller this period the fewer dead clients although you pay with traffic.

Disunite answered 29/5, 2011 at 20:28 Comment(0)

Recommended topics

Hot tags