Maybe because:
- It doesn't naturally work on multiple columns at once.
- It doesn't support ordering. I.e. if your categories are ordinal, such as:
Awful, Bad, Average, Good, Excellent
LabelEncoder
would give them an arbitrary order (probably as they are encountered in the data), which will not help your classifier.
In this case you could use either an OrdinalEncoder
or a manual replacement.
Encode categorical features as an integer array.
df = pd.DataFrame(data=[['Bad', 200], ['Awful', 100], ['Good', 350], ['Average', 300], ['Excellent', 1000]], columns=['Quality', 'Label'])
enc = OrdinalEncoder(categories=[['Awful', 'Bad', 'Average', 'Good', 'Excellent']]) # Use the 'categories' parameter to specify the desired order. Otherwise the ordered is inferred from the data.
enc.fit_transform(df[['Quality']]) # Can either fit on 1 feature, or multiple features at once.
Output:
array([[1.],
[0.],
[3.],
[2.],
[4.]])
Notice the logical order in the ouput.
scale_mapper = {'Awful': 0, 'Bad': 1, 'Average': 2, 'Good': 3, 'Excellent': 4}
df['Quality'].replace(scale_mapper)
Output:
0 1
1 0
2 3
3 2
4 4
Name: Quality, dtype: int64