What is drop_first=True
drop_first=True
drops the first column during dummy variable creation. Suppose, you have a column for gender that contains 4 variables- "Male", "Female", "Other", "Unknown". So a person is either "Male", or "Female", or "Other". If they are not either of these 3, their gender is "Unknown".
We do NOT need another column for "Uknown".
It can be necessary for some situations, while not applicable for others. The goal is to reduce the number of columns by dropping the column that is not necessary. However, it is not always true. For some situations, we need to keep the first column.
Example
Suppose, we have 5 unique values in a column called "Fav_genre"- "Rock", "Hip hop", "Pop", "Metal", "Country" This column contains value
While dummy variable creation, we usually generate 5 columns. In this case, drop_first=True
is not applicable. A person may have more than one favorite genres. So dropping any of the columns would not be right. Hence, drop_first=False
is the default parameter.