I wish to determine the labels of sklearn LabelEncoder (namely 0,1,2,3,...) to fit a specific order of the possible values of categorical variable (say ['b', 'a', 'c', 'd' ]). LabelEncoder chooses to fit the labels lexicographically I guess as can be seen in this example:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit(['b', 'a', 'c', 'd' ])
le.classes_
array(['a', 'b', 'c', 'd'], dtype='<U1')
le.transform(['a', 'b'])
array([0, 1])
How can I force the encoder to stick to the order of data as it is first met in the .fit method (namely to encode 'b' to 0, 'a' to 1, 'c' to 2, and 'd' to 3)?
OrdinalEncoder
described github.com/scikit-learn-contrib/categorical-encoding and contrib.scikit-learn.org/categorical-encoding/ordinal.html – Raddle