Correlation of two arrays in C#
Asked Answered
Y

6

27

Having two arrays of double values, I want to compute correlation coefficient (single double value, just like the CORREL function in MS Excel). Is there some simple one-line solution in C#?

I already discovered math lib called Meta Numerics. According to this SO question, it should do the job. Here is docs for Meta Numerics correlation method, which I don't get.

Could pls somebody provide me with simple code snippet or example how to use the library?

Note: At the end, I was forced to use one of custom implementations. But if someone reading this question knows good, well documented C# math library/framework to do this, please don't hesitate and post a link in answer.

Yvor answered 3/7, 2013 at 12:20 Comment(2)
this might help you also codeproject.com/Articles/8750/A-computational-statistics-class and this is code for correlation coeficient functionx.com/vcsharp/applications/lcc.htmHobart
There is a library from ta-lib.org which has "CORREL" function. It is very easy to use and gives you the same result as excel. It returns an array of results instead of single value just like Excel.Choli
E
38

You can have the values in separate lists at the same index and use a simple Zip.

var fitResult = new FitResult();
var values1 = new List<int>();
var values2 = new List<int>();

var correls = values1.Zip(values2, (v1, v2) =>
                                       fitResult.CorrelationCoefficient(v1, v2));

A second way is to write your own custom implementation (mine isn't optimized for speed):

public double ComputeCoeff(double[] values1, double[] values2)
{
    if(values1.Length != values2.Length)
        throw new ArgumentException("values must be the same length");

    var avg1 = values1.Average();
    var avg2 = values2.Average();

    var sum1 = values1.Zip(values2, (x1, y1) => (x1 - avg1) * (y1 - avg2)).Sum();

    var sumSqr1 = values1.Sum(x => Math.Pow((x - avg1), 2.0));
    var sumSqr2 = values2.Sum(y => Math.Pow((y - avg2), 2.0));

    var result = sum1 / Math.Sqrt(sumSqr1 * sumSqr2);

    return result;
}

Usage:

var values1 = new List<double> { 3, 2, 4, 5 ,6 };
var values2 = new List<double> { 9, 7, 12 ,15, 17 };

var result = ComputeCoeff(values1.ToArray(), values2.ToArray());
// 0.997054485501581

Debug.Assert(result.ToString("F6") == "0.997054");

Another way is to use the Excel function directly:

var values1 = new List<double> { 3, 2, 4, 5 ,6 };
var values2 = new List<double> { 9, 7, 12 ,15, 17 };

// Make sure to add a reference to Microsoft.Office.Interop.Excel.dll
// and use the namespace

var application = new Application();

var worksheetFunction = application.WorksheetFunction;

var result = worksheetFunction.Correl(values1.ToArray(), values2.ToArray());

Console.Write(result); // 0.997054485501581
Exsert answered 3/7, 2013 at 12:25 Comment(9)
+1 Thanks for providing code sample, and clarifying how the library works! Problem is that it only works for arrays of ints instead of double. Not your fault of course, but I can't mark as answered.Yvor
Yeah I didn't see the parameters were of type int. If you need to work with doubles then you may need to write your own extension method for it.Exsert
If you look at the source for the class you'll see that it's using a matrix to compute the Correlation Coefficient, so you can probably mimic it.Exsert
Thank u for your efford, I greatly appreciate it! Was thinking about custom code & excel api too, but it somehow seemed like too much work for such a common task :)Yvor
I'm glad you found my examples helpful! The Excel API is a little crufty, but it works.Exsert
Where did the CorrelationCoefficient function come from in the first example?Staton
NM, found it in the MetaNumerics library: meta-numerics.net/documentation/html/…Staton
Ok. Let me know if the post is missing anything.Exsert
ty very much for answer. how can this be used for calculating similarity of 2 given vectors? or distance of 2 given vectors?Arcuate
F
31

Math.NET Numerics is a well-documented math library that contains a Correlation class. It calculates Pearson and Spearman ranked correlations: http://numerics.mathdotnet.com/api/MathNet.Numerics.Statistics/Correlation.htm

The library is available under the very liberal MIT/X11 license. Using it to calculate a correlation coefficient is as easy as follows:

using MathNet.Numerics.Statistics;

...

correlation = Correlation.Pearson(arrayOfValues1, arrayOfValues2);

Good luck!

Farmhand answered 3/12, 2014 at 16:45 Comment(2)
thanks for link! this might actually be the best library so far, method usage really couldn't be any easier :-)Yvor
As an update, version 3.5 of Math.NET Numerics added a method to their Correlation class to calculate weighted Pearson correlation.Farmhand
I
9

In order to calculate Pearson product-moment correlation coefficient

http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

You can use this simple code:

  public static Double Correlation(Double[] Xs, Double[] Ys) {
    Double sumX = 0;
    Double sumX2 = 0;
    Double sumY = 0;
    Double sumY2 = 0;
    Double sumXY = 0;

    int n = Xs.Length < Ys.Length ? Xs.Length : Ys.Length;

    for (int i = 0; i < n; ++i) {
      Double x = Xs[i];
      Double y = Ys[i];

      sumX += x;
      sumX2 += x * x;
      sumY += y;
      sumY2 += y * y;
      sumXY += x * y;
    }

    Double stdX = Math.Sqrt(sumX2 / n - sumX * sumX / n / n);
    Double stdY = Math.Sqrt(sumY2 / n - sumY * sumY / n / n);
    Double covariance = (sumXY / n - sumX * sumY / n / n);

    return covariance / stdX / stdY; 
  }
Idioblast answered 3/7, 2013 at 12:50 Comment(2)
Hi Dmitry, please can you tell me if all values in the arrays are the same, the function returns a NaN, do I have to check if they are equals to return 1 or NaN will always mean 1? Tks! Example dotnetfiddle.net/eiYgtdPlangent
@Tico Fortes: if all values in the array are the same you actually have just one point, with no variation at all; if there are no variation, correlation is not defined (0/0), in this case in can be any value in [-1..1] range or NaN (Not a Number) as I've implementedIdioblast
B
6

If you don't want to use a third party library, you can use the method from this post (posting code here for backup).

public double Correlation(double[] array1, double[] array2)
{
    double[] array_xy = new double[array1.Length];
    double[] array_xp2 = new double[array1.Length];
    double[] array_yp2 = new double[array1.Length];
    for (int i = 0; i < array1.Length; i++)
    array_xy[i] = array1[i] * array2[i];
    for (int i = 0; i < array1.Length; i++)
    array_xp2[i] = Math.Pow(array1[i], 2.0);
    for (int i = 0; i < array1.Length; i++)
    array_yp2[i] = Math.Pow(array2[i], 2.0);
    double sum_x = 0;
    double sum_y = 0;
    foreach (double n in array1)
        sum_x += n;
    foreach (double n in array2)
        sum_y += n;
    double sum_xy = 0;
    foreach (double n in array_xy)
        sum_xy += n;
    double sum_xpow2 = 0;
    foreach (double n in array_xp2)
        sum_xpow2 += n;
    double sum_ypow2 = 0;
    foreach (double n in array_yp2)
        sum_ypow2 += n;
    double Ex2 = Math.Pow(sum_x, 2.00);
    double Ey2 = Math.Pow(sum_y, 2.00);

    return (array1.Length * sum_xy - sum_x * sum_y) /
           Math.Sqrt((array1.Length * sum_xpow2 - Ex2) * (array1.Length * sum_ypow2 - Ey2));
}
Barranca answered 3/7, 2013 at 12:24 Comment(0)
P
2

In my tests, both @Dmitry Bychenko's and @keyboardP's code postings above resulted in generally the same correlations as Microsoft Excel over a handful of manual tests I did, and did not need any external libraries.

e.g. Running this once (data for this run listed at the bottom):

@Dmitry Bychenko: -0.00418479432051121

@keyboardP:______-0.00418479432051131

MS Excel:_________-0.004184794

Here is a test harness:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace TestCorrel {
    class Program {

        static void Main(string[] args) {

            Random rand = new Random(DateTime.Now.Millisecond);

            List<double> x = new List<double>();
            List<double> y = new List<double>();

            for (int i = 0; i < 100; i++) {

                x.Add(rand.Next(1000) * rand.NextDouble());
                y.Add(rand.Next(1000) * rand.NextDouble());

                Console.WriteLine(x[i] + "," + y[i]);
            }

            Console.WriteLine("Correl1: " + Correl1(x, y));
            Console.WriteLine("Correl2: " + Correl2(x, y));
        }

        public static double Correl1(List<double> x, List<double> y) {

            //https://mcmap.net/q/495861/-correlation-of-two-arrays-in-c
            if (x.Count != y.Count)
                return (double.NaN); //throw new ArgumentException("values must be the same length");

            double sumX = 0;
            double sumX2 = 0;
            double sumY = 0;
            double sumY2 = 0;
            double sumXY = 0;

            int n = x.Count < y.Count ? x.Count : y.Count;

            for (int i = 0; i < n; ++i) {

                Double xval = x[i];
                Double yval = y[i];

                sumX += xval;
                sumX2 += xval * xval;
                sumY += yval;
                sumY2 += yval * yval;
                sumXY += xval * yval;
            }

            Double stdX = Math.Sqrt(sumX2 / n - sumX * sumX / n / n);
            Double stdY = Math.Sqrt(sumY2 / n - sumY * sumY / n / n);
            Double covariance = (sumXY / n - sumX * sumY / n / n);

            return covariance / stdX / stdY;
        }

        public static double Correl2(List<double> x, List<double> y) {

            double[] array_xy = new double[x.Count];
            double[] array_xp2 = new double[x.Count];
            double[] array_yp2 = new double[x.Count];

            for (int i = 0; i < x.Count; i++)
                array_xy[i] = x[i] * y[i];
            for (int i = 0; i < x.Count; i++)
                array_xp2[i] = Math.Pow(x[i], 2.0);
            for (int i = 0; i < x.Count; i++)
                array_yp2[i] = Math.Pow(y[i], 2.0);
            double sum_x = 0;
            double sum_y = 0;
            foreach (double n in x)
                sum_x += n;
            foreach (double n in y)
                sum_y += n;
            double sum_xy = 0;
            foreach (double n in array_xy)
                sum_xy += n;
            double sum_xpow2 = 0;
            foreach (double n in array_xp2)
                sum_xpow2 += n;
            double sum_ypow2 = 0;
            foreach (double n in array_yp2)
                sum_ypow2 += n;
            double Ex2 = Math.Pow(sum_x, 2.00);
            double Ey2 = Math.Pow(sum_y, 2.00);

            double Correl = 
            (x.Count * sum_xy - sum_x * sum_y) /
            Math.Sqrt((x.Count * sum_xpow2 - Ex2) * (x.Count * sum_ypow2 - Ey2));

            return (Correl);
        }
    }
}

Data for the example numbers above:

287.688269702572,225.610842817282
618.9313498167,177.955550192835
25.7778882802361,27.6549569366756
140.847984766051,714.618547504125
438.618761728806,533.48764902702
481.347431274758,214.381256273194
21.6406916848573,393.559209519792
135.30397563209,158.419851317732
334.314685154853,814.275162949821
764.614904770914,50.1435267264692
42.8179292282173,47.8631582287434
237.216836650491,370.488416981179
388.849658539449,134.961087643151
305.903013161804,441.926902444068
10.6625048679591,369.567569480076
36.9316453891488,24.8947204607049
2.10067253471383,491.941975629861
7.94887068492774,573.037801189831
341.738006353722,653.497146697015
98.8424873439793,475.215988045193
272.248712629196,36.1088809138671
122.336823399801,169.158256422336
9.32281673202422,631.076001565473
201.118425176068,803.724831627554
415.514343714115,64.248651454341
227.791637123,230.512133914284
25.3438658925443,396.854282886188
596.238994411304,72.543763144195
230.239735877253,933.983901697669
796.060099040186,689.952468971234
9.30882684202344,269.22063744125
16.5005430148451,8.96549091859045
536.324005148524,358.829873788557
519.694526420764,17.3212184707267
552.628357889423,12.5541588051962
210.516099897454,388.57537739937
141.341571405689,268.082028986924
503.880356335491,753.447006912645
515.494990213539,444.451280259737
973.8670776076,168.922799013985
85.7111146094795,36.3784999169309
37.2147129193017,108.040356312432
504.590177939548,50.3934166889607
482.821039277511,888.984586256083
5.52549206350255,156.717087003271
405.833169031345,394.099059180868
459.249365587835,11.68776424494
429.421127440604,314.216759666901
126.908422469584,331.907062556551
62.1416232716952,3.19765723645578
4.16058817699579,604.04046284223
484.262182311277,220.177370167886
58.6774453314382,339.09660232677
463.482149892246,199.181594849183
344.128297473829,268.531428258182
0.883430369609702,209.346384477963
77.9462970131758,255.221325168955
583.629439312792,235.557751925922
358.409186083083,376.046612200349
81.2148325150902,10.7696774717279
53.7315618049966,274.171515094196
111.284646992239,130.174321939319
317.280491961763,338.077288461885
177.454564264722,7.53587801919127
69.2239431670047,233.693477620228
823.419546454875,0.111916855029723
23.7174749401014,200.989081544331
44.9598299125022,102.633862571155
74.1602278468945,292.485449988155
130.11182449251,23.4682153367755
243.088760058903,335.807090202722
13.3974915991526,436.983231269281
73.3900805168739,252.352352472186
592.144630201228,92.3395205570103
57.7306153447044,47.1416798900541
522.649018382024,584.427794722108
15.3662010204821,60.1693953262499
16.8335716728277,851.401980430541
33.9869734449251,0.930781653584345
116.66608504982,146.126050951949
92.8896130355492,711.765618208687
317.91980889529,322.186540377413
44.8574470732629,209.275617858058
751.201537871362,37.935519233316
161.817758424588,2.83156183493862
531.64078452142,79.1750782491523
114.803219681048,283.106988439852
123.472725123853,154.125248027558
89.9276725453919,63.4626924192825
105.623296753328,111.234188702067
435.72981759707,23.7058234576629
259.324810619152,69.3535200857341
719.885234421531,381.086239833891
24.2674900099018,198.408173349876
57.7761600361095,146.52277489124
77.4594609157459,710.746080866431
636.671781979814,538.894185951396
56.6035279932448,58.2563265684323
485.16099039333,427.849954283261
91.9552873247095,576.92944263617
Panga answered 16/11, 2017 at 20:49 Comment(0)
C
-1
Public Function Correlation(ByRef array1() As Double, ByRef array2() As Double) As Double
    'siehe https://mcmap.net/q/495861/-correlation-of-two-arrays-in-c

    'der hier errechnete "Pearson correlation coefficient" muss noch quadriert werden, um R-Squared zu erhalten, siehe
    'https://en.wikipedia.org/wiki/Coefficient_of_determination


    Dim array_xy(array1.Length - 1) As Double
    Dim array_xp2(array1.Length - 1) As Double
    Dim array_yp2(array1.Length - 1) As Double

    Dim i As Integer
    For i = 0 To array1.Length - 1
        array_xy(i) = array1(i) * array2(i)
    Next i
    For i = 0 To array1.Length - 1
        array_xp2(i) = Math.Pow(array1(i), 2.0)
    Next i
    For i = 0 To array1.Length - 1
        array_yp2(i) = Math.Pow(array2(i), 2.0)
    Next i


    Dim sum_x As Double = 0
    Dim sum_y As Double = 0
    Dim EinDouble As Double

    For Each EinDouble In array1
        sum_x += EinDouble
    Next
    For Each EinDouble In array2
        sum_y += EinDouble
    Next

    Dim sum_xy As Double = 0
    For Each EinDouble In array_xy
        sum_xy += EinDouble
    Next

    Dim sum_xpow2 As Double = 0
    For Each EinDouble In array_xp2
        sum_xpow2 += EinDouble
    Next

    Dim sum_ypow2 As Double = 0
    For Each EinDouble In array_yp2
        sum_ypow2 += EinDouble
    Next

    Dim Ex2 As Double = Math.Pow(sum_x, 2.0)
    Dim Ey2 As Double = Math.Pow(sum_y, 2.0)

    Dim ReturnWert As Double
    ReturnWert = (array1.Length * sum_xy - sum_x * sum_y) / Math.Sqrt((array1.Length * sum_xpow2 - Ex2) * (array1.Length * sum_ypow2 - Ey2))
    Correlation = ReturnWert
End Function
Counterwork answered 20/4, 2020 at 17:41 Comment(2)
this is the Code from Contango, translated into VB.NET. It gives the same result as Excel's Correl functionCounterwork
Wrong language.Atalya

© 2022 - 2024 — McMap. All rights reserved.