Python equivalent to R poly() function?
Asked Answered
I

2

9

I'm trying to understand how to replicate the poly() function in R using scikit-learn (or other module).

For example, let's say I have a vector in R:

a <- c(1:10)

And I want to generate 3rd degree polynomial:

polynomial <- poly(a, 3)

I get the following:

              1           2          3
[1,] -0.49543369  0.52223297 -0.4534252
[2,] -0.38533732  0.17407766  0.1511417
[3,] -0.27524094 -0.08703883  0.3778543
[4,] -0.16514456 -0.26111648  0.3346710
[5,] -0.05504819 -0.34815531  0.1295501
[6,]  0.05504819 -0.34815531 -0.1295501
[7,]  0.16514456 -0.26111648 -0.3346710
[8,]  0.27524094 -0.08703883 -0.3778543
[9,]  0.38533732  0.17407766 -0.1511417
[10,]  0.49543369  0.52223297  0.4534252

I'm relatively new to python and I'm trying understand how to utilize the PolynomiaFeatures function in sklearn to replicate this. I've spent time time looking at examples at the PolynomialFeatures documentation but I'm still a bit confused.

Any insight would be greatly appreciated. Thanks!

Iowa answered 24/12, 2016 at 21:58 Comment(6)
There is a NumPy for R (and S-Plus) users cheat sheet. You can be lucky.Sandeesandeep
Thanks! I took a look at it but it doesn't seem to have what I'm searching for (or I'm completely missing it).Iowa
Could you give a description (specification) of the R poly() function?Sandeesandeep
#19484553 explains what poly does in RKilpatrick
Can you explain what are you trying to do? Without referencing the equivalent function in R?Bosco
I'm trying to apply the kfold cross validation method on a generalized linear model at different n-degree polynomials.Iowa
P
14

It turns out that you can replicate the result of R's poly(x,p) function by performing a QR decomposition of a matrix whose columns are the powers of the input vector x from the 0th power (all ones) up to the pth power. The Q matrix, minus the first constant column, gives you the result you want.

So, the following should work:

import numpy as np

def poly(x, p):
    x = np.array(x)
    X = np.transpose(np.vstack((x**k for k in range(p+1))))
    return np.linalg.qr(X)[0][:,1:]

In particular:

In [29]: poly([1,2,3,4,5,6,7,8,9,10], 3)
Out[29]: 
array([[-0.49543369,  0.52223297,  0.45342519],
       [-0.38533732,  0.17407766, -0.15114173],
       [-0.27524094, -0.08703883, -0.37785433],
       [-0.16514456, -0.26111648, -0.33467098],
       [-0.05504819, -0.34815531, -0.12955006],
       [ 0.05504819, -0.34815531,  0.12955006],
       [ 0.16514456, -0.26111648,  0.33467098],
       [ 0.27524094, -0.08703883,  0.37785433],
       [ 0.38533732,  0.17407766,  0.15114173],
       [ 0.49543369,  0.52223297, -0.45342519]])

In [30]: 
Plafker answered 24/12, 2016 at 22:49 Comment(4)
Very helpful. Thanks all for the help!Iowa
This is really useful. Do you know how to apply this transformation to new data not used in fitting, as is done in R in this answer? How do we get the coefficients from the transformation?Tripartition
I am also curious how we would be able to get the coefficients afterwardsCitric
I just wanted to add this as a comment to say Thank you and offcourse i upvoted!!Ridglea
B
3

The answer by K. A. Buhr is full and complete.

The R poly function also calculates interactions of different degrees of the members. That's why I was looking for the R poly equivalent.
sklearn.preprocessing.PolynomialFeatures Seems to provide such, you can do the np.linalg.qr(X)[0][:,1:] step after to get the orthogonal matrix.

Something like this:

import numpy as np
import pprint
import sklearn.preprocessing
PP = pprint.PrettyPrinter(indent=4)

MATRIX = np.array([[ 4,  2],[ 2,  3],[ 7,  4]])
poly = sklearn.preprocessing.PolynomialFeatures(2)
PP.pprint(MATRIX)
X = poly.fit_transform(MATRIX)
PP.pprint(X)

Results in:

array([[4, 2],
       [2, 3],
       [7, 4]])
array([[ 1.,  4.,  2., 16.,  8.,  4.],
       [ 1.,  2.,  3.,  4.,  6.,  9.],
       [ 1.,  7.,  4., 49., 28., 16.]])
Baucis answered 9/2, 2018 at 23:22 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.