LabelEncoder: TypeError: '>' not supported between instances of 'float' and 'str'
Asked Answered
T

5

95

I'm facing this error for multiple variables even treating missing values. For example:

le = preprocessing.LabelEncoder()
categorical = list(df.select_dtypes(include=['object']).columns.values)
for cat in categorical:
    print(cat)
    df[cat].fillna('UNK', inplace=True)
    df[cat] = le.fit_transform(df[cat])
#     print(le.classes_)
#     print(le.transform(le.classes_))


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-424a0952f9d0> in <module>()
      4     print(cat)
      5     df[cat].fillna('UNK', inplace=True)
----> 6     df[cat] = le.fit_transform(df[cat].fillna('UNK'))
      7 #     print(le.classes_)
      8 #     print(le.transform(le.classes_))

C:\Users\paula.ceccon.ribeiro\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py in fit_transform(self, y)
    129         y = column_or_1d(y, warn=True)
    130         _check_numpy_unicode_bug(y)
--> 131         self.classes_, y = np.unique(y, return_inverse=True)
    132         return y
    133 

C:\Users\paula.ceccon.ribeiro\AppData\Local\Continuum\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py in unique(ar, return_index, return_inverse, return_counts)
    209 
    210     if optional_indices:
--> 211         perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
    212         aux = ar[perm]
    213     else:

TypeError: '>' not supported between instances of 'float' and 'str'

Checking the variable that lead to the error results ins:

df['CRM do Médico'].isnull().sum()
0

Besides nan values, what could be causing this error?

Tonatonal answered 25/9, 2017 at 13:42 Comment(0)
P
159

This is due to the series df[cat] containing elements that have varying data types e.g.(strings and/or floats). This could be due to the way the data is read, i.e. numbers are read as float and text as strings or the datatype was float and changed after the fillna operation.

In other words

pandas data type 'Object' indicates mixed types rather than str type

so using the following line:

df[cat] = le.fit_transform(df[cat].astype(str))


should help

Phobia answered 25/9, 2017 at 13:57 Comment(5)
It really does. Do you know why? I'm already reading them as str using dtypes.Tonatonal
Its most likely related to the definition of an object type in pandas, object type does not necessarily mean dtype str. and pandas forces the type to change when it inserts NaN valuesPhobia
What is le? Which package?Washday
@hhh, most likely syDysregulation performed the following import. from sklearn.preprocessing import LabelEncoder as le . The .fit_transform was a give-away.Naamann
Hi, I have a similar problem. If you have time, can request your help with this related post? #71194240Lowpressure
T
9

As string data types have variable length, it is by default stored as object type. I faced this problem after treating missing values too. Converting all those columns to type 'category' before label encoding worked in my case.

df[cat]=df[cat].astype('category')

And then check df.dtypes and perform label encoding.

Toilsome answered 23/10, 2018 at 4:45 Comment(0)
A
2

Or use a cast with split to uniform type of str

unique, counts = numpy.unique(str(a).split(), return_counts=True)
Adame answered 21/8, 2018 at 19:28 Comment(0)
K
2

df['cat'] = df['cat'].apply(str) worked.

Keratoplasty answered 20/6, 2022 at 22:16 Comment(0)
T
-2

In my case, I had nan in a list; which limits certain operations you can do

Tessy answered 25/10, 2022 at 8:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.