How to change the format of .describe() output?
Asked Answered
W

7

29

I put .describe() to a Dataframe, the output doesn't look nice. I want the output to show the whole number and not be simplified with exponentials.

Input:

df["A"].describe()

How the output looks like:

count    6.000000e+01
mean     7.123568e+04
std      2.144483e+05
min      1.000000e+02
25%      2.770080e+03
50%      1.557920e+04
75%      4.348470e+04
max      1.592640e+06
Name: A, dtype: float64

Expected Output:

count    60.0
mean     7123.568
std      214448.3
min      100.0000
25%      2770.080
50%      15579.20
75%      43484.70
max      1592640.0
Name: A, dtype: float64
Wolverhampton answered 28/3, 2019 at 10:5 Comment(1)
Possible duplicate of How do I print entire number in Python from describe() function?Slippy
P
46

You can change the float_format of pandas in pandas set_option

import pandas as pd
import numpy as np

pd.set_option('display.float_format', lambda x: '%.5f' % x)

data = pd.DataFrame()

data['X'] = (np.random.rand(1000, ) + 10000000) * 0.587

data['X'].describe()

# Output 
count      1000.00000
mean    5870000.47894
std           0.28447
min     5870000.00037
25%     5870000.23637
50%     5870000.45799
75%     5870000.71652
max     5870000.99774
Name: X, dtype: float64

Or without using set_option use apply over the output series like this

import pandas as pd
import numpy as np

data = pd.DataFrame()

data['X'] = np.random.rand(1000, ) + 10000000 * 0.587

data['X'].describe().apply("{0:.5f}".format)

#output

count       1000.00000
mean     5870000.48955
std            0.29247
min      5870000.00350
25%      5870000.22416
50%      5870000.50163
75%      5870000.73457
max      5870000.99995
Provencher answered 28/3, 2019 at 10:24 Comment(2)
Hi thank you so much both methods works! I have something to clarify, for using set_option, does it means that if i used set_option, calling .describe() multiple times, output will have always 5 decimal places ?Wolverhampton
if you use set_option and changed the float_format pandas will display float in the provided float_format throughout the codeProvencher
H
16

Including a small modification of the answers suggested above that tend to raise an error in my versions of Python (3.7.7)/Pandas (1.3.3). Suppose you only want summary stats up to the third significant digit, you can use applymap() and an anonymous function.

For example:

df[["A"]].describe().applymap(lambda x: f"{x:0.3f}")
Heyer answered 11/11, 2021 at 16:13 Comment(1)
I just got a warning message that .applymap will be deprecated in a future version of Pandas. Using .map(lambda x: f"{x:0.3f}") achieved the same result and got rid of the warning message.Selfcommand
C
11

try to set the float format for the output you get using pandas

import pandas as pd

pd.set_option('display.float_format', lambda x: '%.3f' % x)
Coussoule answered 28/3, 2019 at 10:18 Comment(0)
V
10

Simple solution if you are using Python 3.8 and Pandas 1.3.5:

df.describe().applymap('{:,.2f}'.format)
Venireman answered 22/1, 2022 at 14:6 Comment(0)
R
0

Just a signle line of pandas would do it for you.

df[["A"]].describe().format('{:.3f}')
Ramulose answered 22/7, 2021 at 17:15 Comment(3)
Fails for me: AttributeError: 'Series' object has no attribute 'format'Dremadremann
Try this : df[["A"]].describe().format('{:.3f}')Ramulose
Not working with Pandas 1.5.3 (Dez'2023)Dozier
E
0

#fun to read long numbers in an easy readable format such as milliion trillion

def human_format(num):
    magnitude = 0
    while abs(num) >= 1000:
        magnitude += 1
        num /= 1000.0
    # add more suffixes if you need them
    return '%.2f%s' % (num, ['', 'K', 'Million', 'Trillion', 'G', 'P'][magnitude])

Original DataFrame

df.describe()
           sales        profile
count   3.504600e+04    35046.000000
mean    1.132153e+07    613.877191
std     2.622250e+08    3862.190022
min    -3.702949e+09    -16202.130000
25%     5.221783e+03    7.000000
50%     3.110371e+04    33.000000
75%     2.131200e+05    135.000000
max     2.621423e+10    92930.370000

Above :- 09 means Trillion, 06 Million and so on.. But still messy to read.

Below:- you can read it in more human way

df2 = df.describe() #creating a diff. dataframe
for x in df2:
  df2[x] =df2[x].apply(human_format)

df2
        total_amounts   volume_mt
count   35.05K           35.05K
mean    11.32Million     613.88
std     262.22Million    3.86K
min     -3.70Trillion    -16.20K
25%     5.22K            7.00
50%     31.10K           33.00
75%     213.12K          135.00
max     26.21Trillion    92.93K
Etem answered 27/12, 2021 at 14:47 Comment(0)
K
-1

you can use

df["A"].describe(include=['category'])
Kenti answered 28/3, 2019 at 10:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.