Select only columns that have at most N unique values
Asked Answered
C

2

5

I want to count the number of unique values in each column and select only those columns which have less than 32 unique values.

I tried using df.filter(nunique<32) and

df[[ c for df.columns in df if c in c.nunique<32]] 

but because nunique is a method and not function they don't work. Thought len(set() would work and tried

df.apply(lambda x : len(set(x))

but doesn't work as well. Any ideas please? thanks in advance!

Costly answered 24/6, 2019 at 16:27 Comment(0)
K
11

nunique can be called on the entire DataFrame (you have to call it). You can then filter out columns using loc:

df.loc[:, df.nunique() < 32]

Minimal Verifiable Example

df = pd.DataFrame({'A': list('abbcde'), 'B': list('ababab')})
df
   A  B
0  a  a
1  b  b
2  b  a
3  c  b
4  d  a
5  e  b

df.nunique()
A    5
B    2
dtype: int64

df.loc[:, df.nunique() < 3]
   B
0  a
1  b
2  a
3  b
4  a
5  b
Kilroy answered 24/6, 2019 at 16:29 Comment(0)
P
0

If anyone wants to do it in a method chaining fashion, you can:

df.loc[:, lambda x: x.nunique() < 3]
Pool answered 24/8, 2022 at 16:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.