When and how to use Polynomial.fit() as opposed to polyfit()?
Asked Answered
F

2

5

Using Python 3.10.0 and NumPy 1.21.4.

I'm trying to understand why Polynomial.fit() calculates wildly different coefficient values from polyfit().

In the following code:

import numpy as np

def main():
    x = np.array([3000, 3200, 3400, 3600, 3800, 4000, 4200, 4400, 4600, 4800, 5000, 5200, 5400, 5600, 5800, 6000, 6200, 6400, 6600, 6800, 7000])
    y = np.array([5183.17702344, 5280.24520952, 5758.94478531, 6070.62698406, 6584.21169885, 8121.20863245, 7000.57326186, 7380.01493624, 7687.97802847, 7899.71417408, 8506.90860692, 8421.73816463, 8705.58403352, 9275.46094996, 9552.44715196, 9850.70796049, 9703.53073907, 9833.39941224, 9900.21604921, 9901.06392084, 9974.51206378])

    c1 = np.polynomial.polynomial.polyfit(x, y, 2)
    c2 = np.polynomial.polynomial.Polynomial.fit(x, y, 2).coef

    print(c1)
    print(c2)

if __name__ == '__main__':

    main()

c1 contains:

[-3.33620814e+03  3.44704650e+00 -2.18221029e-04]

which produces the the line of best fit when plugged a + bx + cx^2 that I predicted while c2 contains:

[8443.4986422  2529.67242075 -872.88411679]

which results in a very different line when plugged into the same formula.

The documentation seems to imply that Polynomial.fit() is the new preferred way of calculating the line but it keeps outputting the wrong coefficients (unless my understanding of polynomial regression is completely wrong).

If I am not using the functions correctly, what is the correct way of using them?

If I am using both functions correctly, why would I use Polynomial.fit() over polyfit(), as the documentation seems to imply I should?

Footgear answered 9/12, 2021 at 21:39 Comment(0)
V
9

According to Polynomial.fit() documentation, it returns:

A series that represents the least squares fit to the data and has the domain and window specified in the call. If the coefficients for the unscaled and unshifted basis polynomials are of interest, do new_series.convert().coef.

You can find in https://numpy.org/doc/stable/reference/routines.polynomials.html#transitioning-from-numpy-poly1d-to-numpy-polynomial that

coefficients are given in the scaled domain defined by the linear mapping between the window and domain. convert can be used to get the coefficients in the unscaled data domain.

You can check

import numpy as np

def main():
    x = np.array([3000, 3200, 3400, 3600, 3800, 4000, 4200, 4400, 4600, 4800, 5000, 5200, 5400, 5600, 5800, 6000, 6200, 6400, 6600, 6800, 7000])
    y = np.array([5183.17702344, 5280.24520952, 5758.94478531, 6070.62698406, 6584.21169885, 8121.20863245, 7000.57326186, 7380.01493624, 7687.97802847, 7899.71417408, 8506.90860692, 8421.73816463, 8705.58403352, 9275.46094996, 9552.44715196, 9850.70796049, 9703.53073907, 9833.39941224, 9900.21604921, 9901.06392084, 9974.51206378])

    c1 = np.polynomial.polynomial.polyfit(x, y, 2)
    c2 = np.polynomial.polynomial.Polynomial.fit(x, y, 2).convert().coef
    c3 = np.polynomial.polynomial.Polynomial.fit(x, y, 2, window=(x.min(), x.max())).coef

    print(c1)
    print(c2)
    print(c3)

if __name__ == '__main__':

    main()

# [-3.33620814e+03  3.44704650e+00 -2.18221029e-04]
# [-3.33620814e+03  3.44704650e+00 -2.18221029e-04]
# [-3.33620814e+03  3.44704650e+00 -2.18221029e-04]

Another argument for using np.polynomial.Polynomial class is stated here in the docs https://numpy.org/doc/stable/reference/routines.polynomials.package.html

Vasilek answered 9/12, 2021 at 22:26 Comment(2)
Is numpy.polynomial.polynomial.polyfit really legacy? That's not mentioned in the docs. It's my understanding that numpy.polyfit is legacy but polynomial.polyfit is not.Monaghan
You're right, I don't remember reference for statement about np.polynomial.polynomial.polyfit's legacy. The only one reference I can find is suggestion to prefer convenience classes from np.polynomial namespace here: numpy.org/doc/stable/reference/…Vasilek
P
3
import numpy as np

def main():
    x = np.array([3000, 3200, 3400, 3600, 3800, 4000, 4200, 4400, 4600, 4800, 5000, 5200, 5400, 5600, 5800, 6000, 6200, 6400, 6600, 6800, 7000])
    y = np.array([5183.17702344, 5280.24520952, 5758.94478531, 6070.62698406, 6584.21169885, 8121.20863245, 7000.57326186, 7380.01493624, 7687.97802847, 7899.71417408, 8506.90860692, 8421.73816463, 8705.58403352, 9275.46094996, 9552.44715196, 9850.70796049, 9703.53073907, 9833.39941224, 9900.21604921, 9901.06392084, 9974.51206378])

    c1 = np.polynomial.polynomial.polyfit(x, y, 2)
    c2 = np.polynomial.polynomial.Polynomial.fit(x, y, 2, domain=[]).coef

    print(c1)
    print(c2)
main()

You can also get the coefficients by pass an empty list to domain keyword which forces the class to use its default domain of [-1,1] and gives these outputs

[-3.33620814e+03  3.44704650e+00 -2.18221029e-04]
[-3.33620814e+03  3.44704650e+00 -2.18221029e-04]
Prospector answered 9/12, 2021 at 22:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.