Statsmodels: Short way of writing Formula
Asked Answered
F

2

6

Logistic regression model using statesmodels:

log_reg = st.logit(formula = 'label ~ pregnant + glucose + bp + insulin + bmi + pedigree + age', data=pima).fit()

is there any short way of writing second part of formula (pregnant + glucose + bp + insulin + bmi + pedigree + age)? Here all the columns have to be mentioned explicitly. If there are more than 100 columns, it would be difficult to write and also the statement would be very long.

Feodor answered 20/2, 2016 at 2:38 Comment(0)
F
4

If df is a pd.DataFrame, and y is the target variable, this function returns a string of the formula you are looking for.

def formula_from_cols(df, y):
    return y + ' ~ ' + ' + '.join([col for col in df.columns if not col==y])
Fullfaced answered 27/4, 2019 at 14:47 Comment(0)
G
0

There are no specific shortcuts for the formulas.

You can use python string manipulation to build the formula, e.g. based on pandas dataframe column names.

Or you work directly with arrays or dataframes. But even then you need a list of names if you want human readable output for example in summary(). If you only need prediction, then arrays without variable names are useful.

Grissom answered 20/2, 2016 at 3:40 Comment(2)
Thanks for the input. Found a way using Pandas dataframe column name:Feodor
str1 = pima.columns[-1] + " ~ " + " + ".join(list(pima.columns[1:len(pima.columns)-1]) log_reg = st.logit(formula = str1, data=pima).fit()Feodor

© 2022 - 2024 — McMap. All rights reserved.