Python equivalent to R poly() function?

Asked 24/12, 2016 at 21:58 Answered 9/2, 2018 at 23:22

I'm trying to understand how to replicate the poly() function in R using scikit-learn (or other module).

For example, let's say I have a vector in R:

a <- c(1:10)

And I want to generate 3rd degree polynomial:

polynomial <- poly(a, 3)

I get the following:

              1           2          3
[1,] -0.49543369  0.52223297 -0.4534252
[2,] -0.38533732  0.17407766  0.1511417
[3,] -0.27524094 -0.08703883  0.3778543
[4,] -0.16514456 -0.26111648  0.3346710
[5,] -0.05504819 -0.34815531  0.1295501
[6,]  0.05504819 -0.34815531 -0.1295501
[7,]  0.16514456 -0.26111648 -0.3346710
[8,]  0.27524094 -0.08703883 -0.3778543
[9,]  0.38533732  0.17407766 -0.1511417
[10,]  0.49543369  0.52223297  0.4534252

I'm relatively new to python and I'm trying understand how to utilize the PolynomiaFeatures function in sklearn to replicate this. I've spent time time looking at examples at the PolynomialFeatures documentation but I'm still a bit confused.

Any insight would be greatly appreciated. Thanks!

Iowa answered 24/12, 2016 at 21:58 Comment(6)

There is a NumPy for R (and S-Plus) users cheat sheet. You can be lucky. – Sandeesandeep 24/12, 2016 at 22:13

Thanks! I took a look at it but it doesn't seem to have what I'm searching for (or I'm completely missing it). – Iowa 24/12, 2016 at 22:15

Could you give a description (specification) of the R poly() function? – Sandeesandeep 24/12, 2016 at 22:16

#19484553 explains what poly does in R – Kilpatrick 24/12, 2016 at 22:26

Can you explain what are you trying to do? Without referencing the equivalent function in R? – Bosco 24/12, 2016 at 22:34

I'm trying to apply the kfold cross validation method on a generalized linear model at different n-degree polynomials. – Iowa 24/12, 2016 at 22:45

It turns out that you can replicate the result of R's poly(x,p) function by performing a QR decomposition of a matrix whose columns are the powers of the input vector x from the 0th power (all ones) up to the pth power. The Q matrix, minus the first constant column, gives you the result you want.

So, the following should work:

import numpy as np

def poly(x, p):
    x = np.array(x)
    X = np.transpose(np.vstack((x**k for k in range(p+1))))
    return np.linalg.qr(X)[0][:,1:]

In particular:

In [29]: poly([1,2,3,4,5,6,7,8,9,10], 3)
Out[29]: 
array([[-0.49543369,  0.52223297,  0.45342519],
       [-0.38533732,  0.17407766, -0.15114173],
       [-0.27524094, -0.08703883, -0.37785433],
       [-0.16514456, -0.26111648, -0.33467098],
       [-0.05504819, -0.34815531, -0.12955006],
       [ 0.05504819, -0.34815531,  0.12955006],
       [ 0.16514456, -0.26111648,  0.33467098],
       [ 0.27524094, -0.08703883,  0.37785433],
       [ 0.38533732,  0.17407766,  0.15114173],
       [ 0.49543369,  0.52223297, -0.45342519]])

In [30]:

Plafker answered 24/12, 2016 at 22:49 Comment(4)

Very helpful. Thanks all for the help! – Iowa 24/12, 2016 at 22:59

This is really useful. Do you know how to apply this transformation to new data not used in fitting, as is done in R in this answer? How do we get the coefficients from the transformation? – Tripartition 4/11, 2018 at 10:33

I am also curious how we would be able to get the coefficients afterwards – Citric 25/1, 2019 at 20:20

I just wanted to add this as a comment to say Thank you and offcourse i upvoted!! – Ridglea 12/4, 2020 at 9:1

The answer by K. A. Buhr is full and complete.

The R poly function also calculates interactions of different degrees of the members. That's why I was looking for the R poly equivalent.
sklearn.preprocessing.PolynomialFeatures Seems to provide such, you can do the np.linalg.qr(X)[0][:,1:] step after to get the orthogonal matrix.

Something like this:

import numpy as np
import pprint
import sklearn.preprocessing
PP = pprint.PrettyPrinter(indent=4)

MATRIX = np.array([[ 4,  2],[ 2,  3],[ 7,  4]])
poly = sklearn.preprocessing.PolynomialFeatures(2)
PP.pprint(MATRIX)
X = poly.fit_transform(MATRIX)
PP.pprint(X)

Results in:

array([[4, 2],
       [2, 3],
       [7, 4]])
array([[ 1.,  4.,  2., 16.,  8.,  4.],
       [ 1.,  2.,  3.,  4.,  6.,  9.],
       [ 1.,  7.,  4., 49., 28., 16.]])

Baucis answered 9/2, 2018 at 23:22 Comment(0)

Recommended topics

Hot tags