In understand that when I have a category variable in a model passed to a statsmodels
fit
that dummy variables will automatically be generated for the categories. For example if I have a variable 'Location' with values 'IndianOcean', 'Thailand', 'China' and 'Mars' I will get variables in my model of the form
Location[T.Thailand]
with one of the value not represented. By default the excluded variable seems to be the least common one. Is there a way to specify — ideally within the model specification — which value is treated as the "base value" and excluded?
C
in the formula (as in... + C(Location, Treatment) + ...
does the trick, but this results in some pretty ugly category names that I'd like to avoid. – Peerage