Format certain floating dataframe columns into percentage in pandas
Asked Answered
Q

12

131

I am trying to write a paper in IPython notebook, but encountered some issues with display format. Say I have following dataframe df, is there any way to format var1 and var2 into 2 digit decimals and var3 into percentages.

       var1        var2         var3    
id                                              
0    1.458315    1.500092   -0.005709   
1    1.576704    1.608445   -0.005122    
2    1.629253    1.652577   -0.004754    
3    1.669331    1.685456   -0.003525   
4    1.705139    1.712096   -0.003134   
5    1.740447    1.741961   -0.001223   
6    1.775980    1.770801   -0.001723    
7    1.812037    1.799327   -0.002013    
8    1.853130    1.822982   -0.001396    
9    1.943985    1.868401    0.005732

The numbers inside are not multiplied by 100, e.g. -0.0057=-0.57%.

Quintessa answered 1/6, 2014 at 15:48 Comment(2)
In case if anyone is looking at this question after 2014, look at my answer for a concise answer.Thielen
The answers work for immediate formatting, but I was hoping to "attach" the format to the column so that I could continue doing other stuff with the dataframe and it would always print that column in that format (unless I reset the format to something else). Is this possible?Peyton
L
49

replace the values using the round function, and format the string representation of the percentage numbers:

df['var2'] = pd.Series([round(val, 2) for val in df['var2']], index = df.index)
df['var3'] = pd.Series(["{0:.2f}%".format(val * 100) for val in df['var3']], index = df.index)

The round function rounds a floating point number to the number of decimal places provided as second argument to the function.

String formatting allows you to represent the numbers as you wish. You can change the number of decimal places shown by changing the number before the f.

p.s. I was not sure if your 'percentage' numbers had already been multiplied by 100. If they have then clearly you will want to change the number of decimals displayed, and remove the hundred multiplication.

Leonor answered 1/6, 2014 at 16:22 Comment(8)
Thanks, will this change the actual values within each column?Quintessa
Yes, if that is not desired, then just create new columns with those variables in. As far as I know, there is no way to specify how output appears beyond what the data actually are.Leonor
To round the values in a series you can also just use df['var2'].round(2)Fushih
You could also set the default format for float : pd.options.display.float_format = '{:.2f}%'.formatPyrotechnics
@romain That's a great suggestion (for some use-cases) it should be its own answer (so I can upvote it) Though it does need tweak to multiply by 100.Leeke
as suggested by @linqu, you generally do not want to modify data for display.Beghard
Is there a way to display a column as percentage without converting it to a string?Aguedaaguero
see below answer which is betterLeonor
I
206

The accepted answer suggests to modify the raw data for presentation purposes, something you generally do not want. Imagine you need to make further analyses with these columns and you need the precision you lost with rounding.

You can modify the formatting of individual columns in data frames, in your case:

output = df.to_string(formatters={
    'var1': '{:,.2f}'.format,
    'var2': '{:,.2f}'.format,
    'var3': '{:,.2%}'.format
})
print(output)

For your information '{:,.2%}'.format(0.214) yields 21.40%, so no need for multiplying by 100.

You don't have a nice HTML table anymore but a text representation. If you need to stay with HTML use the to_html function instead.

from IPython.core.display import display, HTML
output = df.to_html(formatters={
    'var1': '{:,.2f}'.format,
    'var2': '{:,.2f}'.format,
    'var3': '{:,.2%}'.format
})
display(HTML(output))

Update

As of pandas 0.17.1, life got easier and we can get a beautiful html table right away:

df.style.format({
    'var1': '{:,.2f}'.format,
    'var2': '{:,.2f}'.format,
    'var3': '{:,.2%}'.format,
})
Idiosyncrasy answered 23/3, 2016 at 10:20 Comment(10)
If you have n or a variable amount of columns in your dataframe and you want to apply the same formatting across all columns, but you may not know all the column headers in advance, you don't have to put the formatters in a dictionary, you can do a list and do it creatively like this: output = df.to_html(formatters=n * ['{:,.2%}'.format])Matthei
A standard set of these in a dict with attr access would be great.Unsocial
The parts .format are not needed, you may omit them.Lupelupee
df.style.format({'var3': '{:,.2%}'}) - this is not working. Values remain unchanged i.e. without %Inosculate
@Inosculate % needs to be outside the brackets in '{:.2f}%'Dryer
@Dryer Note there's no f in '{:,.2%}'. @Inosculate Apparently df.style.format doesn't change the DataFrame, but IronPython renders the result as HTML. You can use df.style.format(...).render() to get the HTML yourself. Or, use df.apply directly, which is what happens internally, per documentation.Cognizable
Is there a way to add this as the default representation of a data frame? I have tried changing __str__ and __repr__ but neither seems to alter the output when putting df at the end of my Jupyter/IPython cell.Reider
still doesn't work df.style.format({'perc count': '{:,.2}%'}) Why does this answer have so many upvotes? '%' in or out of the curly bracket...Knp
Ahhh, figured it out as per one of the comments below : very important, you need to assign a df to the formated df. It is not an option you can set on an existing df. eg: df = df.style.format({'perc count': '{:,.2}%'})Knp
@Knp the output of df.style.format is a Styler object, NOT a DataFrame. Setting it to a variable does not allow you to change any of the underlying DataFrames data values. The Styler object only controls how it is displayed in an output format, e.g. HTML notebook, LaTeX, to string or to Excel.Cashman
P
69

You could also set the default format for float :

pd.options.display.float_format = '{:.2%}'.format

Use '{:.2%}' instead of '{:.2f}%' - The former converts 0.41 to 41.00% (correctly), the latter to 0.41% (incorrectly)

Pyrotechnics answered 28/7, 2015 at 9:10 Comment(4)
Good to know and relevant to OP's question about outputting in an python notebookYesima
And if the percentages are still given in decimals (e.g. when using df.pct_change()): pd.options.display.float_format = '{:.2%}'.formatTrier
of course this would affect all of your dataframes, so you'd have to unset it again to display non-percentage floats.Numerical
As per @Numerical ' comment, this is not really useful.Knp
L
49

replace the values using the round function, and format the string representation of the percentage numbers:

df['var2'] = pd.Series([round(val, 2) for val in df['var2']], index = df.index)
df['var3'] = pd.Series(["{0:.2f}%".format(val * 100) for val in df['var3']], index = df.index)

The round function rounds a floating point number to the number of decimal places provided as second argument to the function.

String formatting allows you to represent the numbers as you wish. You can change the number of decimal places shown by changing the number before the f.

p.s. I was not sure if your 'percentage' numbers had already been multiplied by 100. If they have then clearly you will want to change the number of decimals displayed, and remove the hundred multiplication.

Leonor answered 1/6, 2014 at 16:22 Comment(8)
Thanks, will this change the actual values within each column?Quintessa
Yes, if that is not desired, then just create new columns with those variables in. As far as I know, there is no way to specify how output appears beyond what the data actually are.Leonor
To round the values in a series you can also just use df['var2'].round(2)Fushih
You could also set the default format for float : pd.options.display.float_format = '{:.2f}%'.formatPyrotechnics
@romain That's a great suggestion (for some use-cases) it should be its own answer (so I can upvote it) Though it does need tweak to multiply by 100.Leeke
as suggested by @linqu, you generally do not want to modify data for display.Beghard
Is there a way to display a column as percentage without converting it to a string?Aguedaaguero
see below answer which is betterLeonor
T
43

Often times we are interested in calculating the full significant digits, but for the visual aesthetics, we may want to see only few decimal point when we display the dataframe.

In jupyter-notebook, pandas can utilize the html formatting taking advantage of the method called style.

For the case of just seeing two significant digits of some columns, we can use this code snippet:

Given dataframe

import numpy as np
import pandas as pd

df = pd.DataFrame({'var1': [1.458315, 1.576704, 1.629253, 1.6693310000000001, 1.705139, 1.740447, 1.77598, 1.812037, 1.85313, 1.9439849999999999],
          'var2': [1.500092, 1.6084450000000001, 1.652577, 1.685456, 1.7120959999999998, 1.741961, 1.7708009999999998, 1.7993270000000001, 1.8229819999999999, 1.8684009999999998],
          'var3': [-0.0057090000000000005, -0.005122, -0.0047539999999999995, -0.003525, -0.003134, -0.0012230000000000001, -0.0017230000000000001, -0.002013, -0.001396, 0.005732]})

print(df)
       var1      var2      var3
0  1.458315  1.500092 -0.005709
1  1.576704  1.608445 -0.005122
2  1.629253  1.652577 -0.004754
3  1.669331  1.685456 -0.003525
4  1.705139  1.712096 -0.003134
5  1.740447  1.741961 -0.001223
6  1.775980  1.770801 -0.001723
7  1.812037  1.799327 -0.002013
8  1.853130  1.822982 -0.001396
9  1.943985  1.868401  0.005732

Style to get required format

    df.style.format({'var1': "{:.2f}",'var2': "{:.2f}",'var3': "{:.2%}"})

Gives:

     var1   var2    var3
id          
0   1.46    1.50    -0.57%
1   1.58    1.61    -0.51%
2   1.63    1.65    -0.48%
3   1.67    1.69    -0.35%
4   1.71    1.71    -0.31%
5   1.74    1.74    -0.12%
6   1.78    1.77    -0.17%
7   1.81    1.80    -0.20%
8   1.85    1.82    -0.14%
9   1.94    1.87    0.57%

Update

If display command is not found try following:

from IPython.display import display

df_style = df.style.format({'var1': "{:.2f}",'var2': "{:.2f}",'var3': "{:.2%}"})

display(df_style)

Requirements

  • To use display command, you need to have installed Ipython in your machine.
  • The display command does not work in online python interpreter which do not have IPyton installed such as https://repl.it/languages/python3
  • The display command works in jupyter-notebook, jupyter-lab, Google-colab, kaggle-kernels, IBM-watson,Mode-Analytics and many other platforms out of the box, you do not even have to import display from IPython.display
Thielen answered 11/4, 2019 at 3:49 Comment(8)
This is the most Pythonic answer.Corrie
This is a way better answer than the accepted one. Changing the formatting is much preferable to actually changing the underlying values.Marital
@Poudel This is not working. I have used exacly the same code as yours and var3 is not formatted as percentageInosculate
@Inosculate try display(df.style.format({'var1': "{:.2f}",'var2': "{:.2f}",'var3': "{:.2%}"}))Thielen
@Poudel It worked now. There is one superflous bracket at the end. It should be: df_style = df.style.format({'var1': "{:.2f}",'var2': "{:.2f}",'var3': "{:.2%}"}) Thanks!Inosculate
Had an issue with the index being non unique, so just had to df.reset_index(inplace= True) and then apply the .style.format. worked perfectly, thank you.Sudanic
As a fact, When I use this answer, I got the message : AttributeError: 'Styler' object has no attribute 'head'Damaging
Ah, very important differentiator here, you need to assign a df to the formated df. It is not simply an option you can set.Knp
I
28

As suggested by @linqu you should not change your data for presentation. Since pandas 0.17.1, (conditional) formatting was made easier. Quoting the documentation:

You can apply conditional formatting, the visual styling of a DataFrame depending on the data within, by using the DataFrame.style property. This is a property that returns a pandas.Styler object, which has useful methods for formatting and displaying DataFrames.

For your example, that would be (the usual table will show up in Jupyter):

df.style.format({
    'var1': '{:,.2f}'.format,
    'var2': '{:,.2f}'.format,
    'var3': '{:,.2%}'.format,
})
Inocenciainoculable answered 28/3, 2017 at 0:45 Comment(2)
This is not working. I have used exacly the same code as yoursInosculate
this will give html output, and be useful for notebooks. for terminal output, printing to files etc, the to_string method is good. It needs pandas version 1.5 or higher.Bascinet
S
18

Just another way of doing it should you require to do it over a larger range of columns

using applymap

df[['var1','var2']] = df[['var1','var2']].applymap("{0:.2f}".format)
df['var3'] = df['var3'].applymap(lambda x: "{0:.2f}%".format(x*100))

applymap is useful if you need to apply the function over multiple columns; it's essentially an abbreviation of the below for this specific example:

df[['var1','var2']].apply(lambda x: map(lambda x:'{:.2f}%'.format(x),x),axis=1)

Great explanation below of apply, map applymap:

Difference between map, applymap and apply methods in Pandas

Steinberg answered 14/2, 2018 at 17:14 Comment(0)
L
8

As a similar approach to the accepted answer that might be considered a bit more readable, elegant, and general (YMMV), you can leverage the map method:

# OP example
df['var3'].map(lambda n: '{:,.2%}'.format(n))

# also works on a series
series_example.map(lambda n: '{:,.2%}'.format(n))

Performance-wise, this is pretty close (marginally slower) than the OP solution.

As an aside, if you do choose to go the pd.options.display.float_format route, consider using a context manager to handle state per this parallel numpy example.

Laskowski answered 19/1, 2018 at 17:22 Comment(0)
S
6

style.format is vectorized, so we can simply apply it to the entire df (or just its numerical columns):

df[num_cols].style.format('{:,.3f}%')


Note that if df contains only 1 column and is in fact a Series, it will first require conversion to pandas DataFrame, e.g. with pd.DataFrame(df[num_col]).style.format, or as pointed below: df[num_col].to_frame().style.format).

Svoboda answered 8/12, 2020 at 18:49 Comment(1)
The series should be converted to data frame first: df[num_cols].to_frame().style.format('{:,.3f}%')Rental
B
0

The list comprehension has an assured result, I'm using it successfully I think you may use python list comprehension as follow:

df['var1'] = ["{:.2f}".format(i) for i in df['var1'] ]
df['var2'] = ["{:.2f}".format(i) for i in df['var2'] ]
df['var3'] = ["{:.2%}".format(i) for i in df['var3'] ]

Thanks

Bequeath answered 21/6, 2022 at 18:17 Comment(0)
F
0

Following from this answer I used the apply function on the given series. In my case, I was interested in showing value_counts for my Series with percentage formatting.

I did:

df['my_col'].value_counts(normalize=True).apply(lambda x: "{0:.2f}%".format(x*100))
# Incident             88.16%
# StreetWorks          3.29% 
# Accident             2.36%
# ... 

Instead of just

df['my_col'].value_counts(normalize=True)
# Incident             0.881634
# StreetWorks          0.032856
# Accident             0.023589
# ...
Fatal answered 9/2, 2023 at 8:6 Comment(0)
K
0

If all the columns of type float should be shown as percentages, you can use a with statement:

with pd.option_context("display.float_format", "{:.2%}".format):
    display(df)

If only specific columns should be formatted as percentages:

df.style.format(formatter={c: "{:.2%}" for c in ["column_1", "column_2"]})
Kleeman answered 8/12, 2023 at 11:6 Comment(0)
P
0

I've had this problem with a correlation matrix:

# this matrix has 2 decimals
correlation_matrix = round(numeric_columns.corr(),2)

# but when I Apply a gradient color scheme to the correlation matrix it prints 6 decimals.
styled_matrix = correlation_matrix.style.background_gradient(cmap='RdYlGn', vmin=-1, vmax=1)

# so instead, I need to add this format(precision=2) into the statement as such to format all of the columns with that precision
styled_matrix = correlation_matrix.style.format(precision=2).background_gradient(cmap='RdYlGn', vmin=-1, vmax=1)
Parsec answered 21/1 at 3:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.