What is the best way to fit a quadratic polynomial to p-dimensional data and compute its gradient and Hessian matrix?

Asked 10/10, 2024 at 17:32 Answered 12/10, 2024 at 22:17

Solved python scikit-learn linear-regression polynomials hessian-matrix

I have been trying to use the scikit-learn library to solve this problem. Roughly:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Make or load an n x p data matrix X and n x 1 array y of the corresponding
# function values.

poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)

# Approximate the derivatives of the gradient and Hessian using the relevant
# finite-difference equations and model.predict.

As the above illustrates, sklearn makes the design choice to separate polynomial regression into PolynomialFeatures and LinearRegression rather than combine these into a single function. This separation has conceptual advantages but also a major drawback: it effectively prevents model from offering the methods gradient and hessian, and model would be significantly more useful if it did.

My current work-around uses finite-difference equations and model.predict to approximate the elements of the gradient and Hessian (as described here). But I don't love this approach — it is sensitive to floating-point error and the "exact" information needed to build the gradient and Hessian is already contained in model.coef_.

Is there any more elegant or accurate method to fit a p-dimensional polynomial and find its gradient and Hessian within Python? I would be fine with one that uses a different library.

Giraffe answered 10/10, 2024 at 17:32 Comment(0)

To compute the gradient or the Hessian of a polynomial, one needs to know exponents of variables in each monomial and the corresponding monomial coefficients. The first piece of this information is provided by poly.powers_, the second by model.coef_:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import numpy as np

np.set_printoptions(precision=2, suppress=True)

X = np.arange(6).reshape(3, 2)
y = np.arange(3)
poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)

print("Exponents:")
print(poly.powers_.T)
print("Coefficients:")
print(model.coef_)

This gives:

Exponents:
[[0 1 0 2 1 0]
 [0 0 1 0 1 2]]
Coefficients:
[ 0.    0.13  0.13 -0.12 -0.    0.13]

The following function can be then used to compute the gradient at a point given by an array x:

def gradient(x, powers, coeffs):
    x = np.array(x)
    gp = np.maximum(0, powers[:, np.newaxis] - np.eye(powers.shape[1], dtype=int))
    gp = gp.transpose(1, 2, 0)
    gc = coeffs * powers.T
    return (((x[:, np.newaxis] ** gp).prod(axis=1)) * gc).sum(axis=1)

For example, we can use it to compute the gradient at the point [0, 1]:

print(gradient([0, 1],  poly.powers_, model.coef_))

This gives:

[0.13 0.38]

The Hessian at a given point can be computed in a similar way.

Leap answered 11/10, 2024 at 5:54 Comment(0)

You can perform symbolic differentiation to compute exact gradients and Hessians. Since the polynomial model's coefficients are known after fitting, you can directly differentiate the polynomial to get its gradient and Hessian matrices symbolically.

import sympy as sp
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression


X = np.random.rand(100, 2)
y = np.random.rand(100)


poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)


coef = model.coef_
intercept = model.intercept_

p = X.shape[1]
vars = sp.symbols(f'x0:{p}')

poly_eq = intercept + sum(coef[i] * (vars[0]**int(i > 0)) * (vars[1]**int(i > 1)) for i in range(len(coef)))

gradient = [sp.diff(poly_eq, v) for v in vars]


hessian = [[sp.diff(grad, v) for v in vars] for grad in gradient]


print("Gradient:", gradient)
print("Hessian:", hessian)

Comprise answered 11/10, 2024 at 8:11 Comment(1)

This gives incorrect results, since poly_eq does not reflect the polynomial features created using sklearn. In particular, it assumes that in these features each variable appears with exponent either 0 or 1, which is not the case. Once this is fixed, it should work. – Leap 11/10, 2024 at 14:19

After studying various solutions to this problem, I believe the easiest method (easiest on the programmer) is to use simpy, roughly as described in Jhon's answer. However, as bb1 notes, Jhon's answer produces incorrect results in its unedited form. Here is one way to edit Jhon's approach to obtain correct results.

from numpy.random import rand
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

from sympy import symbols, prod, diff
from numpy import sum, linspace

n = 200
p = 4
X = rand(n, p)
y = rand(n)

poly = PolynomialFeatures(degree=2)
Xp = poly.fit_transform(X)
model = LinearRegression()
model.fit(Xp, y)

powers = poly.powers_
coeffs = model.coef_
intercept = model.intercept_
syms = symbols(f'x0:{p}')

f = intercept + sum([coeffs[i]*prod(syms**powers[i]) for i in range(len(coeffs))])

Df = [diff(f, xj) for xj in syms]
H = [[diff(djf, xk) for xk in syms] for djf in Df]
print('Gradient:', Df)
print('Hessian:', H)

x0 = linspace(0.2, 0.7, p)  # test point
Df_x0 = [Df[k].evalf(subs={syms[j]:x0[j] for j in range(p)}) for k in range(p)]
print('x0:', x0)
print('Gradient at x0:', Df_x0)

While simpy simplifies the code, it is not necessary, as shown by bb1's answer. Here is a loop-based way to write bb1's answer that I find easier to read. Note though that it may run slower, so if speed is a concern, bb1's answer may be superior.

from numpy import *

x0 = [0, 1]

# Define p, X, y, poly, model, powers, and coeffs.

grad = array(p * [float(0)])
for i in range(p):
    gc = powers.T[i] * coeffs  # gradient coefficients
    gp = powers.copy()  # gradient powers
    gp[:,i] = maximum(0, gp[:,i] - 1)
    grad[i] = sum([gc[j]*prod(x0**gp[j]) for j in range(len(coeffs))])
print('grad = ', grad)

hess = p * [None]
for i in range(p):
    gc = powers.T[i] * coeffs
    gp = powers.copy()
    gp[:,i] = maximum(0, gp[:,i] - 1)
    hess[i] = p * [float(0)]
    for j in range(p):
        hc = gp.T[j] * gc  # Hessian coefficients
        hp = gp.copy()  # Hessian powers
        hp[:,j] = maximum(0, hp[:,j] - 1)
        hess[i][j] = sum([hc[k]*prod(x0**hp[k]) for k in range(len(coeffs))])
hess = array(hess)
print('hess = ', hess)

Giraffe answered 12/10, 2024 at 22:17 Comment(0)

Recommended topics

Hot tags