Select only columns that have at most N unique values

About

Asked 24/6, 2019 at 16:27 Answered 24/8, 2022 at 16:23

Solved python pandas dataframe data-science

I want to count the number of unique values in each column and select only those columns which have less than 32 unique values.

I tried using df.filter(nunique<32) and

df[[ c for df.columns in df if c in c.nunique<32]]

but because nunique is a method and not function they don't work. Thought len(set() would work and tried

df.apply(lambda x : len(set(x))

but doesn't work as well. Any ideas please? thanks in advance!

Costly answered 24/6, 2019 at 16:27 Comment(0)

nunique can be called on the entire DataFrame (you have to call it). You can then filter out columns using loc:

df.loc[:, df.nunique() < 32]

Minimal Verifiable Example

df = pd.DataFrame({'A': list('abbcde'), 'B': list('ababab')})
df
   A  B
0  a  a
1  b  b
2  b  a
3  c  b
4  d  a
5  e  b

df.nunique()
A    5
B    2
dtype: int64

df.loc[:, df.nunique() < 3]
   B
0  a
1  b
2  a
3  b
4  a
5  b

Kilroy answered 24/6, 2019 at 16:29 Comment(0)

If anyone wants to do it in a method chaining fashion, you can:

df.loc[:, lambda x: x.nunique() < 3]

Pool answered 24/8, 2022 at 16:23 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags