To extract distinct values for all categorical columns in dataframe
Asked Answered
S

3

6

I have a situation where I need to print all the distinct values that are there for all the categorical columns in my data frame The dataframe looks like this :

Gender  Function  Segment
M       IT        LE
F       IT        LM
M       HR        LE
F       HR        LM

The output should give me the following:

Variable_Name    Distinct_Count
Gender           2
Function         2
Segment          2

How to achieve this?

Sempiternal answered 28/1, 2020 at 14:34 Comment(0)
G
8

using nunique then passing the series into a new datafame and setting column names.

df_unique = df.nunique().to_frame().reset_index()
df_unique.columns = ['Variable','DistinctCount']

print(df_unique)
   Variable  DistinctCount
0    Gender              2
1  Function              2
2   Segment              2
Gildus answered 28/1, 2020 at 14:39 Comment(2)
Nice one! This is probably what I was missing and made it the hard way :PRaver
no need to loop when someone has made it easy for us and created nice methods :) @CeliusStingherGildus
R
1

This is not good, yet it won't fail to provide the expected output:

new_data = {'Variable_Name':[],'Distinct_Count':[]}
for i in list(df):
    new_data['Variable_Name'].append(i)
    new_data['Distinct_Count'].append(df[i].nunique())
new_df = pd.DataFrame(new_data)
print(new_df)

Output:

  Variable_Name  Distinct_Count
0        Gender               2
1      Function               2
2       Segment               2
Raver answered 28/1, 2020 at 14:40 Comment(0)
R
0

Given the dataset you are using has only categorical, you can make use of pandas function describe().

for example:

df_unique = df.describe().iloc[[1]].T

df_unique will be a dataframe with 2 columns, 1 categorical variables and another with the number of unique values of corresponding categorical columns.

Restharrow answered 30/8, 2023 at 19:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.