C# first class continuation via C++ interop or some other way?
Asked Answered
P

5

31

We have a very high performance multitasking, near real-time C# application. This performance was achieved primarily by implementing cooperative multitasking in-house with a home grown scheduler. This is often called micro-threads. In this system all the tasks communicate with other tasks via queues.

The specific problem that we have seems to only be solvable via first class continuations which C# does not support.

Specifically the problem arises in 2 cases dealing with queues. Whenever any particular task performs some work before placing an item on a queue. What if the queue is full?

Conversely, a different task may do some work and then need to take an item off of a queue. What if that queue is empty?

We have solved this in 90% of the cases by linking queues to tasks to avoid tasks getting invoked if any of their outbound queues are full or inbound queue is empty.

Furthermore certain tasks were converted into state machines so they can handle if a queue is full/empty and continue without waiting.

The real problem arises in a few edge cases where it is impractical to do either of those solutions. The idea in that scenario would be to save the stack state at the point and switch to a different task so that it can do the work and subsequently retry the waiting task whenever it is able to continue.

In the past, we attempted to have the waiting task call back into the schedule (recursively) to allow the other tasks to and later retry the waiting task. However, that led to too many "deadlock" situations.

There was an example somewhere of a custom CLR host to make the .NET threads actually operate as "fibers" which essentially allows switching stack state between threads. But now I can't seem to find any sample code for that. Plus it seems that will take some significant complexity to get it right.

Does anyone have any other creative ideas how to switch between tasks efficiently and avoid the above problems?

Are there any other CLR hosts that offer this, commercial or otherwise? Is there any add-on native library that can offer some form of continuations for C#?

Preeminent answered 31/12, 2011 at 2:5 Comment(12)
Fiber mode CLR host sample -- you may be referring to Dino Viehland's blog series about this, which starts here: blogs.msdn.com/b/dinoviehland/archive/2004/08/16/215140.aspxBreathy
Nit: A deadlock is not an efficiency issue. It is a correctness issue. (Also, what is the difference between a full unbound queue and no more memory? State is state and must be stored somewhere.)Godroon
We're adding a form of continuations to C# 5. Though they are not exactly first-class call/cc-style continuations, they are morally equivalent. Check out the "async CTP" for a preview release of it. msdn.microsoft.com/en-us/vstudio/gg316360.aspx. Also see Stephen Toub's recent MSDN Magazine article on the async feature's performance characteristics.Valoniah
Eric, kind of you to reply! I have studied and considered async CTP already. Unfortunately, it includes internal scheduler which disallows sufficiently controlling which async tasks run and when. It won't work for us because our system includes both real time data streams and high bandwidth historical streams. So scheduling of tasks requires "finesse" to keep real time moving near real time and fill in bandwidth gaps with historical data.Preeminent
@Wayne, async CTP does include some schedulers, but you don't have to use them. When your write await something, something is given the continuation and it's up to it what to do with it. So, it seems to me you just have to implement your own awaitable type(s).Inescutcheon
okay. We'll look into that in the future when it's out of beta testing.Preeminent
Mono's Continuations seems to be just that: A CLI VM extension which allows for to 'restore' a task to a given state. They even have a Microthreading library with it's own Scheduler class.Refund
FYI we found a solution to all our particular difficulties by implementing an erlang-style task scheduler with a single central queue per core for message passing. It's no longer necessary to wait on a full queue and the central queue never gets full--we also added logic to manage the flow of data to eliminate flooding of queues. Thanks.Preeminent
I'm just curious to know, what kind of application is it that you're developing, that requires such high performance multitasking?Hakenkreuz
Something financial, as like as not :3Denaedenarius
It's financial trading systems for stocks, futures, forex. The largest performance requirements for are testing models of trading strategies on historical data of ticks. Ticks can represent every trade that happens which is a lot of data, every time the best bid and offer changes which is a great deal more data. And finally, all limit orders changes on the entire order book which is a fantastic amount of data. So the requirement is to process hundreds of thousands of data points per second. A million, if possible. We're at about 250,000 per second at present with plans to achieve 750,000.Preeminent
if you need such realtime performance, maybe you should question using C#. C++ should have been a better choice with less layers and closes to operating system. Off course in a big system, more performance intensive routines are written in C++ while other routines are implemented in either java or .NET. Even on Windows phone, where more applications require real time processing, they are implemented using C++ not .NETHendrik
P
1

Actually, we decided on a direction to go with this. We're using the Observer pattern with Message Passsing. We built a home grown library to handle all communication between "Agents" which are similar to an Erlang process. Later we will consider using AppDomains to even better separate Agents from each other. Design ideas were borrowed from the Erlang programming language which has extremely reliable mult-core and distributed processing.

Preeminent answered 15/2, 2012 at 15:8 Comment(3)
If you're curious about this, you can find other posts I put on stackoverflow specifically about Erlang style handling in C#. Those were done while seeking help to wrap my head around the pros and cons of Erlang message passing to leverage the design ideas there. Specifically, the problem of queue full is solved in Erlang and now borrowing that as well as the copy by value communication.Preeminent
Interestingly, the message passing engine we built now determines if the message being received will run on the same thread/core as the sender, then that totally eliminates locks. So contention only happen when absolutely necessary to jump to another core which merely induces an L1 cache miss which is 4 times cheaper on a quad core than a CAS operation since CAS stops all cores while L1 cache miss only affects the single core that reads the data that was written.Preeminent
For continuations, we simply use a state machine since that's how compilers do it anyway. Sadly, there isn't any syntactic sugar in C# for continuations. But a state machine servers the purpose nicely! Fortunately theres only a handful of places in the system where continuations/state machine are necessary.Preeminent
S
2

There is the C# 5 CTP, which performs a continuation-passing-style transformation over methods declared with the new async keyword, and continuation-passing based calls when using the await keyword.

This is not actually a new CLR feature but rather a set of directives for the compiler to perform the CPS transformation over your code and a handful of library routines for manipulating and scheduling continuations. Activation records for async methods are placed on the heap instead of the stack, so they're not tied to a specific thread.

Supernormal answered 15/2, 2012 at 12:40 Comment(3)
You may want to note that it is an extremely limited CPS transformation.Yungyunick
Lacks the power, performance, and flexibility of the full blown Observer pattern with Message Passing. Message passing seems to be the most elegant and efficient solution to mult-core and distributed programming.Preeminent
@Preeminent Agreed. I only bring it up because it's a first class language feature (though not first class at the runtime level, since it can't be applied to code that is not written and compiled for the feature).Supernormal
Y
1

Nope, not going to work. C# (and even IL) is too complex language to perform such transformations (CPS) in a general way. The best you can get is what C# 5 will offer. That said, you will probably not be able to break/resume with higher order loops/iterations, which is really want you want from general purpose reifiable continuations.

Yungyunick answered 15/2, 2012 at 12:45 Comment(0)
P
1

Fiber mode was removed from v2 of the CLR because of issues under stress, see:

To my knowledge fiber support has not yet bee re-added, although from reading the above articles it may be added again (however the fact that nothing has mentioned for 6-7 years on the topic makes me believe that its unlikely).

FYI fiber support was intended to be a way for existing applications that use fibers (such as SQL Server) to host the CLR in a way that allows them to maximise performance, not as a method to allow .Net applications to create hundereds of threads - in short fibers are not a magic bullet solution to your problem, however if you have an application that uses fibers an wishes to host the CLR then the managed hosting APIs do provide the means for the CLR to "work nicely" with your application. A good source of information on this would be the managed hosting API documentation, or to look into how SQL Server hosts the CLR, of which there are several highly informative articles around.

Also take a quick read of Threads, fibers, stacks and address space.

Potvaliant answered 15/2, 2012 at 13:24 Comment(0)
P
1

Actually, we decided on a direction to go with this. We're using the Observer pattern with Message Passsing. We built a home grown library to handle all communication between "Agents" which are similar to an Erlang process. Later we will consider using AppDomains to even better separate Agents from each other. Design ideas were borrowed from the Erlang programming language which has extremely reliable mult-core and distributed processing.

Preeminent answered 15/2, 2012 at 15:8 Comment(3)
If you're curious about this, you can find other posts I put on stackoverflow specifically about Erlang style handling in C#. Those were done while seeking help to wrap my head around the pros and cons of Erlang message passing to leverage the design ideas there. Specifically, the problem of queue full is solved in Erlang and now borrowing that as well as the copy by value communication.Preeminent
Interestingly, the message passing engine we built now determines if the message being received will run on the same thread/core as the sender, then that totally eliminates locks. So contention only happen when absolutely necessary to jump to another core which merely induces an L1 cache miss which is 4 times cheaper on a quad core than a CAS operation since CAS stops all cores while L1 cache miss only affects the single core that reads the data that was written.Preeminent
For continuations, we simply use a state machine since that's how compilers do it anyway. Sadly, there isn't any syntactic sugar in C# for continuations. But a state machine servers the purpose nicely! Fortunately theres only a handful of places in the system where continuations/state machine are necessary.Preeminent
N
0

The solution to your problem is to use lock-free algorithms allowing for system wide progress of at least one task. You need to use inline assembler that is CPU dependent to make sure that you atomic CAS (compare-and-swap). Wikipedia has an article as well as patterns described the the book by Douglas Schmidt called "Pattern-Oriented Software Architecture, Patterns for Concurrent and Networked Objects". It is not immediately clear to me how you will do that under the dotnet framework.

Other way of solving your problem is using the publish-subscriber pattern or possible thread pools.

Hope this was helpful?

Naara answered 31/1, 2012 at 21:22 Comment(1)
Thanks but testing of CAS is very slow also. For each CAS to be valid, all cores have to synchronize. Our strategy now uses ZERO locks, that includes zero spin locks like CAS. Instead different components communicate via message passing similar to erlang design. This allows true parallel processing without cache misses or CAS slow downs.Preeminent

© 2022 - 2024 — McMap. All rights reserved.