.net code slower on AMD Opteron CPU
Asked Answered
C

1

6

Have encounterred a situation where a simple .net fibonniacci code is slower on a particular set of servers and the only thing that is obviously different is the CPU. AMD Opteron Processor 6276 - 11 secs Intel Xeon XPU E7 - 4850 - 7 secs

Code is complied for x86 and using .NET framework 4.0. -Clock speeds between both is similar and in fact PassMark benchmarks gives highesr scores for AMD. -Have tried this on other AMD servers in the farm and the times are slower. -Even my local I7 machines runs the code faster.

Fibonnacci code:

class Program
{
    static void Main(string[] args)
    {
        const int ITERATIONS = 10000;
        const int FIBONACCI = 100000;

        var watch = new Stopwatch();
        watch.Start();


        DoFibonnacci(ITERATIONS, FIBONACCI);

        watch.Stop();

        Console.WriteLine("Total fibonacci time: {0}ms", watch.ElapsedMilliseconds);
        Console.ReadLine();
    }

    private static void DoFibonnacci(int ITERATIONS, int FIBONACCI)
    {
        for (int i = 0; i < ITERATIONS; i++)
        {
            Fibonacci(FIBONACCI);
        }
    }

    private static int Fibonacci(int x)
    {
        var previousValue = -1;
        var currentResult = 1;

        for (var i = 0; i <= x; ++i)
        {
            var sum = currentResult + previousValue;
            previousValue = currentResult;
            currentResult = sum;
        }

        return currentResult;
    }

}

Any ideas on what maybe going on?

Coakley answered 24/9, 2013 at 9:26 Comment(4)
Other than checking your power management I think Iain is correct, this is more of a SO question.Derange
Same default cpu affinity settings on both machines? And... why 32-bit ?Maraca
@MathiasR.Jessen - that's interesting because I set the CPU affinity to a single CPU and I get 7 seconds nowCoakley
I'm no low-latency software or hardware expert, but I suppose that pinning a routine with a large number of similar consecutive calculations to a single CPU might benefit greatly from branch prediction instead of dispatching them to a new core each cycleMaraca
M
7

As we've established in the comments, you can workaround this performance bash by pinning the process to a specific processor on the AMD Opteron machines.

Kindled by this not-really-on-topic question I decided to have a look at possible scenarios where single core pinning would make such a difference (from 11 to 7 seconds seems a bit extreme).

The most plausible answer is not that revolutionary:

The AMD Opteron series employ HyperTransport in a so-called NUMA architecture, instead of a traditional FSB as you would find on Intel's SMP CPU's (Xeon 4850 included)

My guess is that this symptom stems from the fact that individual nodes in a NUMA architecture has individual cache, as opposed to the Intel CPU, in which the processor cache is shared.

In other words, when consecutive computations shift between nodes on the Opteron, the cache is flushed, whereas balancing between processors in an SMP architecture like the Xeon 4850 has no such impact since the cache is shared.

Setting affinity in .NET is pretty easy, just pick a processor (let's just take the first one for simplicity):

static void Main(string[] args)
{
    Console.WriteLine(Environment.ProcessorCount);
    Console.Read();

    //An AffinityMask of 0x0001 will make sure the process is always pinned to processer 0
    Process thisProcess = Process.GetCurrentProcess();
    thisProcess.ProcessorAffinity = (IntPtr)0x0001; 

    const int ITERATIONS = 10000;
    const int FIBONACCI = 100000;

    var watch = new Stopwatch();
    watch.Start();


    DoFibonnacci(ITERATIONS, FIBONACCI);

    watch.Stop();

    Console.WriteLine("Total fibonacci time: {0}ms", watch.ElapsedMilliseconds);
    Console.ReadLine();
}

Although I'm pretty sure this is not very smart in a NUMA environment.

Windows 2008 R2 has some cool native NUMA functionality, and I found a promissing codeplex project with a .NET wrapper for this as well: http://multiproc.codeplex.com/

I'm in no way near qualified to teach you how to utilize this technology, but this should point you in the right direction.

Maraca answered 24/9, 2013 at 20:50 Comment(4)
I will take a look at this but you just blew my mind with stuff I didn't know about so now i really will have to read up on your reponseCoakley
So many years later, and this is still a problem, even with .net50.Lian
@Coakley I am having a similar problem, do you manage to fix this issue?Acrodont
@Acrodont at the time, no. We effectively just used Intel servers in the end.Coakley

© 2022 - 2024 — McMap. All rights reserved.