I am getting an error when I try to run a multivariable linear regression in Statsmodels. Everything works fine when I hardcode just one X column in the XData variable.
Can someone please give me some advice as what I'm missing here? I would greatly appreciate it.
Error:
ValueError: shapes (747,2) and (747,2) not aligned: 2 (dim 1) != 747 (dim 0)
Code:
import pandas as pd
import statsmodels.api as sm
import itertools
data = pd.read_csv("deaconFoodData.csv")
for i in range(2,10,1):
xCombinations = itertools.combinations(["Food Exp","HH Size","HH Inc","Highest Ed Head","Age Head","Shopping Time","Kid <6","Kid 6-18","Eating Healthy"], i)
print(str(i) + " variables")
for combination in xCombinations:
comb = list(combination)
print(comb)
xData = data[["Food Exp", "HH Size"]] # data[comb]
yData = data["Shopping LH"]
yData = sm.add_constant(yData, prepend=False)
print(yData)
# Fit and summarize OLS model
mod = sm.OLS(xData, yData)
res = mod.fit()
print(res.rsquared)
GitHub Link: https://github.com/deacons2016/DeaconFood
yData
), but not multiple outcomes (yourxData
). Regression with multiple outcomes is usually referred to as "multivariate regression". – Tactilesm.OLS(yData, xData)
, and you should beadd_constanting()
yourxData
, not youryData
. – TactileI am facing similar issues? ValueError: shapes (260,0) and (260,0) not aligned: 0 (dim 1) != 260 (dim 0)
– Kilowatthourprint(res.summary())
that fails – Kilowatthour