What C# tools exist for triggering, queueing, prioritizing dependent tasks
Asked Answered
F

6

10

I have a C# service application which interacts with a database. It was recently migrated from .NET 2.0 to .NET 4.0 so there are plenty of new tools we could use.

I'm looking for pointers to programming approaches or tools/libraries to handle defining tasks, configuring which tasks they depend on, queueing, prioritizing, cancelling, etc.

There are various types of services:

  • Data (for retrieving and updating)
  • Calculation (populate some table with the results of a calculation on the data)
  • Reporting

These services often depend on one another and are triggered on demand, i.e., a Reporting task, will probably have code within it such as

if (IsSomeDependentCalculationRequired())
    PerformDependentCalculation();  // which may trigger further calculations
GenerateRequestedReport();

Also, any Data modification is likely to set the Required flag on some of the Calculation or Reporting services, (so the report could be out of date before it's finished generating). The tasks vary in length from a few seconds to a couple of minutes and are performed within transactions.

This has worked OK up until now, but it is not scaling well. There are fundamental design problems and I am looking to rewrite this part of the code. For instance, if two users request the same report at similar times, the dependent tasks will be executed twice. Also, there's currently no way to cancel a task in progress. It's hard to maintain the dependent tasks, etc..

I'm NOT looking for suggestions on how to implement a fix. Rather I'm looking for pointers to what tools/libraries I would be using for this sort of requirement if I were starting in .NET 4 from scratch. Would this be a good candidate for Windows Workflow? Is this what Futures are for? Are there any other libraries I should look at or books or blog posts I should read?

Edit: What about Rx Reactive Extensions?

Filings answered 3/2, 2012 at 16:44 Comment(3)
Based on a demo I watched of Workflow, it seems like a good match for your requirements, but since I haven't actually used it myself, I'm offering this as a comment and not an answer... for what it's worth.Partan
It is probably helpful if you can comment on the individual answers a little more. That way we can elaborate in the right direction.Cavalla
What about Rx Reactive Extensions? Is that the best approach for my requirements?Filings
C
4

I don't think your requirements fit into any of the built-in stuff. Your requirements are too specific for that.

I'd recommend that you build a task queueing infrastructure around a SQL database. Your tasks are pretty long-running (seconds) so you don't need particularly high throughput in the task scheduler. This means you won't encounter performance hurdles. It will actually be a pretty manageable task from a programming perspective.

Probably you should build a windows service or some other process that is continuously polling the database for new tasks or requests. This service can then enforce arbitrary rules on the requested tasks. For example it can detect that a reporting task is already running and not schedule a new computation.

My main point is that your requirements are that specific that you need to use C# code to encode them. You cannot make an existing tool fit your needs. You need the turing completeness of a programming language to do this yourself.

Edit: You should probably separate a task-request from a task-execution. This allows multiple parties to request a refresh of some reports while at the same time only one actual computation is running. Once this single computation is completed all task-requests are marked as completed. When a request is cancelled the execution does not need to be cancelled. Only when the last request is cancelled the task-execution is cancelled as well.

Edit 2: I don't think workflows are the solution. Workflows usually operate separately from each other. But you don't want that. You want to have rules which span multiple tasks/workflows. You would be working against the system with a workflow based model.

Edit 3: A few words about the TPL (Task Parallel Library). You mentioned it ("Futures"). If you want some inspiration on how tasks could work together, how dependencies could be created and how tasks could be composed, look at the Task Parallel Library (in particular the Task and TaskFactory classes). You will find some nice design patterns there because it is very well designed. Here is how you model a sequence of tasks: You call Task.ContinueWith which will register a continuation function as a new task. Here is how you model dependencies: TaskFactory.WhenAll(Task[]) starts a task that only runs when all its input tasks are completed.

BUT: The TPL itself is probably not well suited for you because its task cannot be saved to disk. When you reboot your server or deploy new code, all existing tasks are being cancelled and the process aborted. This is likely to be unacceptable. Please just use the TPL as inspiration. Learn from it what a "task/future" is and how they can be composed. Then implement your own form of tasks.

Does this help?

Cavalla answered 6/2, 2012 at 12:31 Comment(3)
I added a lot of stuff and said a few things about futures.Cavalla
Very helpful thanks. I would probably have wasted lots of time looking at WF without your comment, and I'll look at the Task Parallel Library as you suggest.Filings
While I'm still not sure exactly which approach to use, this was the most helpful answer and deserves the bounty. I've played around with several of the suggestions and I'm leaning towards either the TPL or Rx.Filings
U
4

I would try to use the state machine package stateless to model the workflow. Using a package will provide a consistent way to advance the state of the workflow, across the various services. Each of your services would hold an internal statemachine implementation, and expose methods for advancing it. Stateless will be resposible for triggering actions based on the state of the workflow, and enforce you to explicitly setup the various states that it can be in - this will be particularly useful for maintenance, and it will probably help you understand the domain better.

Unbalance answered 6/2, 2012 at 12:42 Comment(1)
Great suggestion - just the sort of thing I was hoping for. I'll take a look at it.Filings
E
3

If you want to solve this fundamental problem properly and in a scalable way, you should probably look as SOA architecture style. Your services will receive commands and generate events you can handle in order to react on facts happen in your system.

And, yes, there are tools for it. For example NServiceBus is a wonderful tool to build SOA systems.

Escapism answered 6/2, 2012 at 11:17 Comment(3)
In what way does NServiceBus help with queueing/triggering/prioritizing dependent tasks?. I'm not looking for how to define a service - the application has a service-orineted architecture already (using RemObjects). I'm looking for how to define how different services depend on each other and executing multiple requests in an optimal manner.Filings
NServiceBus does not help with prioritizing/triggering tasks. SOA does. And NServiceBus is a good platform to build SOA on top. In SOA services are not talking to each other and definitely have neither dependencies not knowledge of each other. They publish events on which other services may (or may not) subscribe. And your example of generating reports probably looks like a saga that could be triggered by some events and could manage such a process.Escapism
This answer does not help with the requirements. I don't see how SOA supports the notion of tasks. SOA has different architectural goals that the OP has. Also, web-services are an RPC mechanism. They do not solve a particular problem other than that.Cavalla
D
1

You can do a SQL data agent to run SQL queries in timed interval. You have to write the application yourself it looks like. Write like a long running program that checks the time and does something. I don't think there is clearcut tools out there to do what you are trying to do. Do C# application, WCF service. data automation can be done in the sql itself.

Dylan answered 9/2, 2012 at 14:1 Comment(0)
N
1

If I understand you right you want to cache the generated reports and do not the work again. As other commenters have pointed out this can be solved elegantly with multiple Producer/Consumer queues and some caches. First you enqueue your Report request. Based on the report genration parameters you can check the cache first if a previously generated report is already available and simply return this one. If due to changes in the database the report becomes obsolete you need to take care that the cache is invalidated in a reliable manner.

Now if the report was not generated yet you need need to schedule the report for generation. The report scheduler needs to check if the same report is already beeing generated. If yes register an event to notify you when it is completed and return the report once it is finished. Make sure that you do not access the data via the caching layer since it could produce races (report is generated, data is changed and the finished report would be immediatly discared by the cache leaving noting for you to return).

Or if you do want to prevent to return outdated reports you can let the caching layer become your main data provider which will produce as many reports until one report is generated in time which was not outdated. But be aware that if you have constant changes in your database you might enter an endless loop here by constantly generating invalid reports if the report generation time is longer as the average time between to changes to your db.

As you can see you have plenty of options here without actually talking about .NET, TPL, SQL server. First you need to set your goals how fast/scalable and reliable your system should be then you need to choose the appropriate architecture-design as described above for your particular problem domain. I cannot do it for you because I do not have your full domain know how what is acceptable and what not.

The tricky part is the handover part between different queues with the proper reliability and correctness guarantees. Depending on your specific report generation needs you can put this logic into the cloud or use a single thread by putting all work into the proper queues and work on them concurrently or one by one or something in between.

TPL and SQL server can help there for sure but they are only tools. If used wrongly due to not sufficient experience with the one or the other it might turn out that a different approach (like the usage of only in memory queues and persisted reports on in the file system) is better suited for your problem.

From my current understanding I would not use SQL server to misuse it as a cache but if you want a database I would use something like RavenDB or RaportDB which look stable and much more light weight compared to a full blown SQL server.

But if you already have a SQL server running then go ahead and use it.

Nealy answered 12/2, 2012 at 12:24 Comment(0)
R
0

I am not sure if I understood you correctly, but you might want to have a look at JAMS Scheduler: http://www.jamsscheduler.com/. It's non-free, but a very good system for scheduling depending tasks and reporting. I have used it with success at my previous company. It's written in .NET and there is a .NET API for it, so you can write your own apps communicating with JAMS. They also have a very good support and are eager to implement new features.

Radices answered 13/2, 2012 at 8:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.