QuantReg from statsmodels package in Python gives very different results than in R, using the data as shown in the following code.
I tried the STACKLOSS data in Python and R respectively, and the results were the same. I wonder if the data itself caused some issue in Python, or maybe there is some fundamental difference in the two implementations of the algorithms, but couldn't figure it out.
Code in Python:
from statsmodels.regression.quantile_regression import QuantReg
y = [0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 662.59, 248.08, 331.25, 182.98, 1085.69, -44.32]
X = [
[1, 20322.18, 0.00, 0], [1, 19653.34, 0.00, 0],
[ 1, 0.00, 72712.41, 0], [1, 0.00, 72407.31, 0],
[1, 0.00, 72407.31, 0], [1, 0.00, 72201.89, 9111],
[1, 183.52, 0.00, 0], [1, 183.52, 0.00, 0],
[1, 0.00, 0.00, 2879], [1, 0.00, 0.00, 2698],
[1, 0.00, 0.00, 0], [1, 0.00, 0.00, 0],
[1, 0.00, 0.00, 19358], [1, 0.00, 0.00, 19001]
]
print(QuantReg(y, X).fit(q=.5).summary())
and in R:
library(quantreg)
y <- c(0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 662.59, 248.08, 331.25, 182.98, 1085.69, -44.32)
X <- matrix(
c(1, 20322.18, 0.00, 0, 1, 19653.34, 0.00, 0,
1, 0.00, 72712.41, 0, 1, 0.00, 72407.31, 0,
1, 0.00, 72407.31, 0, 1, 0.00, 72201.89, 9111,
1, 183.52, 0.00, 0, 1, 183.52, 0.00, 0,
1, 0.00, 0.00, 2879, 1, 0.00, 0.00, 2698,
1, 0.00, 0.00, 0, 1, 0.00, 0.00, 0,
1, 0.00, 0.00, 19358, 1, 0.00, 0.00, 19001),
nrow=14, ncol=4, byrow=TRUE
)
rq(y~.-1, data=data.frame(X), tau=.5, method='fn')
R gives the the coefficients of 1.829800e+02, -9.003955e-03, -2.527093e-03, -5.697678e-05
while Python gives the following 3.339e-05, -1.671e-09, -4.635e-10, 7.957e-11
Any input or hint is appreciated.