Efficient Multiple Linear Regression in C# / .Net
Asked Answered
C

6

16

Does anyone know of an efficient way to do multiple linear regression in C#, where the number of simultaneous equations may be in the 1000's (with 3 or 4 different inputs). After reading this article on multiple linear regression I tried implementing it with a matrix equation:

Matrix y = new Matrix(
    new double[,]{{745},
                  {895},
                  {442},
                  {440},
                  {1598}});

Matrix x = new Matrix(
     new double[,]{{1, 36, 66},
                 {1, 37, 68},
                 {1, 47, 64},
                 {1, 32, 53},
                 {1, 1, 101}});

Matrix b = (x.Transpose() * x).Inverse() * x.Transpose() * y;

for (int i = 0; i < b.Rows; i++)
{
  Trace.WriteLine("INFO: " + b[i, 0].ToDouble());
}

However it does not scale well to the scale of 1000's of equations due to the matrix inversion operation. I can call the R language and use that, however I was hoping there would be a pure .Net solution which will scale to these large sets.

Any suggestions?

EDIT #1:

I have settled using R for the time being. By using statconn (downloaded here) I have found it to be both fast & relatively easy to use this method. I.e. here is a small code snippet, it really isn't much code at all to use the R statconn library (note: this is not all the code!).

_StatConn.EvaluateNoReturn(string.Format("output <- lm({0})", equation));
object intercept = _StatConn.Evaluate("coefficients(output)['(Intercept)']");
parameters[0] = (double)intercept;
for (int i = 0; i < xColCount; i++)
{
  object parameter = _StatConn.Evaluate(string.Format("coefficients(output)['x{0}']", i));
  parameters[i + 1] = (double)parameter;
}
Ceilometer answered 26/5, 2010 at 5:50 Comment(2)
Do you mean to make the matrix operations run quicker? I don't think that will be the best approach, I think the best approach will be to use a non-matrix style approach (or something that avoids the inverse).Ceilometer
I've had success with codeproject.com/KB/recipes/LinReg.aspx Very easy to use and open source!Cathexis
C
3

For the record, I recently found the ALGLIB library which, whilst not having much documentation, has some very useful functions such as the linear regression which is one of the things I was after.

Sample code (this is old and unverified, just a basic example of how I was using it). I was using the linear regression on time series with 3 entries (called 3min/2min/1min) and then the finishing value (Final).

public void Foo(List<Sample> samples)
{
  int nAttributes = 3; // 3min, 2min, 1min
  int nSamples = samples.Count;
  double[,] tsData = new double[nSamples, nAttributes];
  double[] resultData = new double[nSamples];

  for (int i = 0; i < samples.Count; i++)
  {
    tsData[i, 0] = samples[i].Tminus1min;
    tsData[i, 1] = samples[i].Tminus2min;
    tsData[i, 2] = samples[i].Tminus3min;

    resultData[i] = samples[i].Final;
  }

  double[] weights = null;
  int fitResult = 0;
  alglib.lsfit.lsfitreport rep = new alglib.lsfit.lsfitreport();
  alglib.lsfit.lsfitlinear(resultData, tsData, nSamples, nAttributes, ref fitResult, ref weights, rep);

  Dictionary<string, double> labelsAndWeights = new Dictionary<string, double>();
  labelsAndWeights.Add("1min", weights[0]);
  labelsAndWeights.Add("2min", weights[1]);
  labelsAndWeights.Add("3min", weights[2]);
}
Ceilometer answered 19/10, 2010 at 22:51 Comment(3)
Nice suggestion. Any code examples you would be willing to post?Lornalorne
See edit for some sample code, I hope it still works (you will need a reference to alglib)Ceilometer
Nice. How would you incorporate an unknown constant variable into this sample?Wivina
B
2

The size of the matrix being inverted does NOT grow with the number of simultaneous equations (samples). x.Transpose() * x is a square matrix where the dimension is the number of independent variables.

Beneficiary answered 26/5, 2010 at 10:12 Comment(3)
Interesting point, I wonder why my performance degrades so much then? I had about 6000 samples in my set. I will have to investigate this further.Ceilometer
I'd guess your performance degrades because x.Transpose() * x takes more time with bigger matrices. I have a library somewhere that works for millions of data points... I'll try to dig it up if you're interested. I faced this problem about twenty years ago (yes I'm old) and found a clever mathematical solution :-)Beneficiary
You should use gradient descent method if you want better scaling.Ostensorium
T
2

To do linear regressions I tend to use Math.Net Numerics.

Math.NET Numerics aims to provide methods and algorithms for numerical computations in science, engineering and every day use. Covered topics include special functions, linear algebra, probability models, random numbers, interpolation, integration, regression, optimization problems and more.

For example, if you want to fit your data to a line using a linear regression, it is as simple as this:

double[] xdata = new double[] { 10, 20, 30 };
double[] ydata = new double[] { 15, 20, 25 };
Tuple"<"double, double">" p = Fit.Line(xdata, ydata);
double a = p.Item1; // == 10; intercept
double b = p.Item2; // == 0.5; slope
Tigon answered 21/3, 2018 at 11:21 Comment(0)
H
1

Try Meta.Numerics:

Meta.Numerics is a library for advanced scientific computation in the .NET Framework. It can be used from C#, Visual Basic, F#, or any other .NET programming language. The Meta.Numerics library is fully object-oriented and optimized for speed of implementation and execution.

To populate a matrix, see an example of the ColumnVector Constructor (IList<Double>). It can construct a ColumnVector from many ordered collections of reals, including double[] and List.

Husein answered 26/5, 2010 at 6:24 Comment(1)
Thanks, I hadn't seen that library before. Looks good, but still suffers the same issues of solving the equations with matrices. I think I need a different approach.Ceilometer
E
1

I can suggest to use FinMath. It is extremely-optimized .net numerical computation library. It uses Intel Math Kernel Library to do complex calculations such as linear regression or matrix inverse, but most classes have very simple approachable interfaces. And of course, it's scalable to a large sets of data. mrnye's example will be look like this:

using FinMath.LeastSquares;
using FinMath.LinearAlgebra;

Vector y = new Vector(new double[]{745,
    895,
    442,
    440,
    1598});

Matrix X = new Matrix(new double[,]{
    {1, 36, 66},
    {1, 37, 68},
    {1, 47, 64},
    {1, 32, 53},
    {1, 1, 101}});

Vector b = OrdinaryLS.FitOLS(X, y);

Console.WriteLine(b);
Effortful answered 20/10, 2011 at 18:20 Comment(0)
P
0

I recently came across MathNet-Numerics - which is available under MIT license.

It claims to provide faster alternatives for the common (X.Transpose() * X).Inverse() * (X.Transpose() * y) process.

Here is are some optimizations from this article. First one being:

X.TransposeThisAndMultiply(X).Inverse() * X.TransposeThisAndMultiply(y)

Or, you could use Cholesky decomposition:

X.TransposeThisAndMultiply(X).Cholesky().Solve(X.TransposeThisAndMultiply(y))
Proteolysis answered 22/9, 2017 at 5:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.