What could be rate limiting CPU cycles on my C# WCF Service?
Asked Answered
P

2

9

Something very strange started happening on our production servers a day or two ago regarding a WCF Service we run there: it seems that something started rate limiting the process in question's CPU cycles to the amount of CPU cycles that would be available on one core, even though the load is spread across all cores (the process is not burning one core to 100% usage)

CPU Graph

The Service is mostly just a CRUD (create, read, update, delete) service, with the exception of a few long running (can take up to 20 minutes) service calls that exist there. These long running service calls kicks of a simple Thread and returns void so not to make the Client application wait, or hold up the WCF connection:

// WCF Service Side
[OperationBehavior]
public void StartLongRunningProcess()
{
    Thread workerThread = new Thread(DoWork);
    workerThread.Start();
}

private void DoWork()
{
    // Call SQL Stored proc
    // Write the 100k+ records to new excel spreadsheet
    // return (which kills off this thread)
}

Before the above call is kicked off, the service seems to respond as it should, Fetching data to display on the front-end quickly.

When you kick off the long running process, and the CPU usage goes to 100 / CPUCores, the front-end response gets slower and slower, and eventually wont accept any more WCF connections after a few minutes.

What I think is happening, is the long running process is using all the CPU cycles the OS is allowing, because something is rate limiting it, and WCF can't get a chance to accept the incoming connection, never mind execute the request.

At some point I started wondering if the Cluster our virtual servers run on is somehow doing this, but then we managed to reproduce this on our development machines with the client communicating to the service using the loopback address, so the hardware firewalls are not interfering with the network traffic either.

While testing this inside of VisualStudio, i managed to start 4 of these long running processes and with the debugger confirmed that all 4 are executing simultaneously, in different threads (by checking Thread.CurrentThread.ManagedThreadId), but still only using 100 / CPUCores worth of CPU cycles in total.

On the production server, it doesn't go over 25% CPU usage (4 cores), when we doubled the CPU cores to 8, it doesn't go over 12.5% CPU usage.

Our development machines have 8 cores, and also wont go over 12.5% CPU usage.

Other things worth mentioning about the service

  • Its a Windows Service
  • Its running inside of a TopShelf host
  • The problem didn't start after a deployment (of our service anyway)
  • Production server is running Windows Server 2008 R2 Datacenter
  • Dev Machines are running Windows 7 Enterprise

Things that we have checked, double checked, and tried:

  • Changing the process' priority up to High from Normal
  • Checked that the processor affinity for the process is not limiting to a specific core
  • The [ServiceBehavior] Attribute is set to ConcurrencyMode = ConcurrencyMode.Multiple
  • Incoming WCF Service calls are executing on different threads
  • Remove TopShelf from the equation hosting the WCF service in just a console application
  • Set the WCF Service throttling values: <serviceThrottling maxConcurrentCalls="1000" maxConcurrentInstances="1000" maxConcurrentSessions="1000" />

Any ideas on what could be causing this?

Ponce answered 25/11, 2015 at 12:51 Comment(11)
wonder if your website is running out of threads for the app poolChretien
It's not a website, its a back-end service for an internal applicationPonce
Wow, you are creating a thread each time for this? Why not to use the ThreadPool? It will use the threads internally and more wisely than your team.Thisbee
Yes, that call doesn't get hit very often maybe once or twice a day. ThreadPools are not magic bullets, they have their place and this is not one of them.Ponce
Limit is very unlikely. Probably, since you are writing to a database you are database bound. Adding more load to the DB does not make it go faster which is why added threads do not help. Put on lots of load, then pause the debugger. You will see all but one thread stopped in a database call most of the time.Rathe
You can also post a screenshot of the Parallel Stacks window when many such jobs are running (like 10). That helps a lot in judging what the app does at the time.Rathe
I'm just wondering, have you inherited from ContextBoundObject and/or used the SynchronizationAttribute anywhere?Patino
@Ponce I see - interesting... That could have explained some things (at least in my world ;-) )Patino
@Ponce are you still interested in the question?Rathe
@Ponce Did you find the cause? Can you please give more information about the process that runs in DoWork()? Are there any resources shared between different threads? Any locking mechanisms? Have you profiled your service? Do most waits happen all around the Thread code or in certain places?Menides
I agree with @Rathe that it's probably database bound. It could also be IO bound communicating with the database. What is your network topology? Is there a lot of latency between the service and the SQL Server? Is your SQL Server multi-processor (ie, not running Express Edition or not bound to a specific CPU)? Does the stored procedure compile, or does it run as a script? All of these things can limit the performance of the thread.Tying
J
1

There must be a shared resource that only allows a single thread to access it at a time. This would effectively only allow one thread at a time to run, and create exactly the situation you have.

Processor affinity masks are the only way to limit a process to a single CPU, and if you did this you would see one CPU pinned and all the others idle (which is not your situation).

We use a tool called LeanSentry that is very good at identifying these kinds of problems. It will attach itself to IIS as a debugger and capture stack dumps of all executing processes, then tell you if most of your threads are blocked in the same spot. There is a free trial that would be long enough for you to figure this out.

Jerilynjeritah answered 5/12, 2015 at 1:24 Comment(0)
D
0

The CPU usage looks like a lock on a table in the SQL Database to me. I would use the SQL management studio to analyze the statements see if it can confirm that.

Also you indicated that you call a stored procedure might want to have it look at that as well.

This all just looks like a database issue to me

Dalesman answered 5/12, 2015 at 16:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.