I'm a bit confused - creating an ML model here.
I'm at the step where I'm trying to take categorical features from a "large" dataframe (180 columns) and one-hot them so that I can find the correlation between the features and select the "best" features.
Here is my code:
# import labelencoder
from sklearn.preprocessing import LabelEncoder
# instantiate labelencoder object
le = LabelEncoder()
# apply le on categorical feature columns
df = df.apply(lambda col: le.fit_transform(col))
df.head(10)
When running this I get the following error:
TypeError: ('argument must be a string or number', 'occurred at index LockTenor')
So I head over to the LockTenor field and look at all the distinct values:
df.LockTenor.unique()
this results in the following:
array([60.0, 45.0, 'z', 90.0, 75.0, 30.0], dtype=object)
looks like all strings and numbers to me. Is the error caused because it's a float and not necessarily an INT?
df.apply(lambda col: le.fit_transform(col))
todf.apply(lambda col: LabelEncoder().fit_transform(col))
? I wonder if your encoder is getting confused with the subsequentfit_transform
calls because it's not being re-initialised. – Protolanguage