Cubic Regression (best fit line) in JavaScript
Asked Answered
C

2

2

I'm having the worst time trying to find a JavaScript code that could allow me to do cubic regressions. Would write it myself, but my understanding of polynomial math is, well, suboptimal.

So, here's what I'm looking for. Given an input of an array of arrays, where the internal array would be [x,y], the function would give me an output in the form of an array with four parameters - [a,b,c,d], where a, b, c, and d are parameters of the equation y = ax^3 + bx^2 + cx + d.

Example: Input is an array like this [[2,5],[5,10],[07,15],[12,20],[20,25],[32,30],[50,35]].

Which essentially is the representation of a table:

|    x   |   y    |
|-----------------|
|   02   |   05   |
|   05   |   10   |
|   07   |   15   |
|   12   |   20   |
|   20   |   25   |
|   32   |   30   |
|   50   |   35   |

Now, the output would be [0.000575085,-0.058861065,2.183957502,1.127605507]. These are the a, b, c, and d parameters of the cubic function.

(FYI, the output I got by using Excel's LINEST function and running it on the above set of numbers using an array function {1,2,3}).

How could this be done? Huge thanks in advance for any guidance.

Best, Tom

Centre answered 11/3, 2014 at 8:1 Comment(0)
L
4

Here's a real, working bit of code to solve that cubic using the numeric.js library's uncmin unconstrained minimiser as a least squares problem (jsbin here):

var data_x = [2,5,7,12,20,32,50];
var data_y = [5,10,15,20,25,30,35];

var cubic = function(params,x) {
  return params[0] * x*x*x +
    params[1] * x*x +
    params[2] * x +
    params[3];
};

var objective = function(params) {
  var total = 0.0;
  for(var i=0; i < data_x.length; ++i) {
    var resultThisDatum = cubic(params, data_x[i]);
    var delta = resultThisDatum - data_y[i];
    total += (delta*delta);
  }
  return total;
};

var initial = [1,1,1,1];
var minimiser = numeric.uncmin(objective,initial);

console.log("initial:");
for(var j=0; j<initial.length; ++j) {
  console.log(initial[j]);  
}

console.log("minimiser:");
for(var j=0; j<minimiser.solution.length; ++j) {
  console.log(minimiser.solution[j]);
}

I get the results:

 0.0005750849851827991
-0.05886106462847641
 2.1839575020602164
 1.1276055079334206

To explain: we have a function 'cubic', which evaluates the general cubic function for a set of parameters params and a value x. This function is wrapped to create the objective function, which takes a set of params and runs each x value from our data set through the target function and calculates the sum of the squares. This function is passed to uncmin from numeric.js with a set of initial values; uncmin does the hard work and returns an object whose solution property contains the optimised parameter set.

To do this without the global variables (naughty!), you can have an objective function factory thus:

var makeObjective = function(targetFunc,xlist,ylist) {
  var objective = function(params) {
    var total = 0.0;
    for(var i=0; i < xlist.length; ++i) {
      var resultThisDatum = targetFunc(params, xlist[i]);
      var delta = resultThisDatum - ylist[i];
      total += (delta*delta);
    }
    return total;
  };
  return objective;
};

Which you can use to manufacture objective functions:

var objective = makeObjective(cubic, data_x, data_y); // then carry on as before

Knowing how to do this practically would be of great help to a lot of people, so I'm glad this has come up.

Edit: Clarification on cubic

var cubic = function(params,x) {
  return params[0] * x*x*x +
    params[1] * x*x +
    params[2] * x +
    params[3];
};

Cubic is being defined as a function which takes an array of parameters params and a value x. Given params, we can define a function f(x). For a cubic, that is f(x) = a x^3 + b x^2 + c x + d so there are 4 parameters ([0] to [3]), and given those 4 param values we have a single function f(x) with 1 input x.

The code is structured to allow you to replace cubic with another function of the same structure; it could be linear with 2 parameters:

var linear = function(params, x) {
    return params[0]*x + params[1];
};

The rest of the code will look at the length of params in order to know how many parameters need modifying.

Note that this whole piece of code is trying to find the set of parameter values which produce a curve which best fits all the data; if you wanted to find a fit for the last 4 points of some data, you would pass only those values in data_x and data_y.

Lymphoblast answered 14/3, 2014 at 16:11 Comment(11)
@PhilH Hey but after I get those numbers, what should I do to calculate the predicted value?Putrescible
This produces a cubic function f(x), so you call it with a value of x: cubic(minimiser.solution, xvalue).Lymphoblast
@PhilH Hey sorry but just a quick check with you, let's say if my data is in the pattern which go up and down crazily, the cubic regression is not suitable in this case, am I right? Because from what I understand, the cubic regression is growing in a 'half arc' pattern, right?Putrescible
@hyperfkcb: Any polynomial fit from quadratic onwards can produce wild curves. It sounds like you have an additional constraint of limiting overshoots? In which case you will need to decide what that constraint is before proceeding.Lymphoblast
@PhilH Hey sorry but does the above solution applicable for the prediction of value which grows in wild curves?Putrescible
@hyperfkcb: Depends how wild. This is fine for any curve that is approximately cubic, even if the coefficients are large. But you can adapt this code for another function by replacing cubic there. If you want to put up some sample data and post a question, link it here.Lymphoblast
@PhilH Hey do you mind to take a look at this: #46881782Putrescible
@PhilH Hey do you mind to explain a little bit more on the cubic()? As in how can I change accordingly given the dataset of [347.3, 77, 549.7, 200, 273, 367.7, 382.2, 231.7, 320.6, 209.8, 388.3, 653.7].Putrescible
@hyperfkcb: You need to put the data in data_x and data_y. I'll add a bit of explanation of cubic to the answer.Lymphoblast
@PhilH Thanks so much for the explanation! However, I realized a problem with the code above, I tried with different data set and I realized that, the graph will either always go down all the way to zero, then shoot up to certain value, then go down again or the other way around. In another word, it is either down, up, down or up, down, up as the curve is limited to three turning points only. Is it supposed to behave like this? If so, is there any formula which will not limit the turning points of the prediction graph to 3 only?Putrescible
@hyperfkcb: Yes and yes. Cubic means a polynomial order 3, which can only have up to 2 turning points. This is not a general system to guess the order of a polynomial - you can always create a polynomial order n-1 through n points which is exact, so you'd never get a 'best fit'. To fit a curve to data points you will need to choose some assumption about the shape of the curve, which is essentially the model you are employing.Lymphoblast
T
3

I'd formulate this as a least squares problem. Let M be the n×4 matrix formed like this:

x_1^3  x_1^2  x_1  1
x_2^3  x_2^2  x_2  1
  ⋮       ⋮      ⋮
x_n^3  x_n^2  x_n  1

Then compute the 4×4 matrix A=MTM and the 4×1 column vector b=MTy and solve the linear system of equations =b. The resulting vector ξ will contain your coefficients a through d.

The above description makes it easy to understand what is going on, mathematically. For implementation, particularly for very large n, the above approach might however be infeasible. In those cases, you can build A and b directly, without explicitely constructing M. For example, A1,2=sum(x_i^3 * x_i^2 for all i). So you can iterate over all i and add the corresponding values to the corresponding matrix and vector entries.

Thrash answered 11/3, 2014 at 9:17 Comment(1)
Or more systematically, A_(i,j)=A_(j,i)=sum(x_k^(8-i-j) over all k), and b_i=sum(y_k*x_k^(4-i) over all k). Use Cholesky or LDLT to factorize the system matrix A and find the solution, it should work without pivoting, since A will be positive semi-definite, and even almost surely positive definite.Karykaryl

© 2022 - 2024 — McMap. All rights reserved.