I am looking for for a pythonic way to handle the following problem.
The pandas.get_dummies()
method is great to create dummies from a categorical column of a dataframe. For example, if the column has values in ['A', 'B']
, get_dummies()
creates 2 dummy variables and assigns 0 or 1 accordingly.
Now, I need to handle this situation. A single column, let's call it 'label', has values like ['A', 'B', 'C', 'D', 'A*C', 'C*D']
. get_dummies()
creates 6 dummies, but I only want 4 of them, so that a row could have multiple 1s.
Is there a way to handle this in a pythonic way? I could only think of some step-by-step algorithm to get it, but that would not include get_dummies(). Thanks
Edited, hope it is more clear!
get_dummies()
on? likedf[['A', 'B', 'C','D']].get_dummies()
? – Charestdf[df.col.isin(['A','B','C'])].get_dummies()
would this work? this would filter out the values you did not want to generate dummy values for – Charest