Descriptive statistics in Python /with Pandas with std in parentheses

import numpy as np import pandas as pd # Generate a DataFrame to have an example df = pd.DataFrame( {"Age" : np.random.normal(20,15,5), "Income": np.random.pareto(1,5)*20_000 } ) # The describe method to get means and stds df.describe().loc[["mean", "std"]].T >>> mean std Age 15.322797 13.449727 Income 97755.733510 143683.686484

idx = pd.IndexSlice df_desc = (df_c ).groupby(level = 0, axis = 0).describe() df_desc = df_desc.loc[idx[:],idx[:,["mean", "std"]]].T df_desc.loc[idx[:,["std"]],idx[:]] = df_desc.loc[idx[:,["std"]],idx[:] ].applymap( lambda x: "("+"{:.2f}".format(x)+")") print(df_desc) >>> A B Age mean 23.1565 21.3359 std (11.62) (9.34) Income mean 68415.5 46619.5 std (95612.40) (64596.10)

df_desc.to_latex() >>> \begin{tabular}{llll} \toprule & & A & B \\ \midrule Age & mean & 5.5905 & 29.5894 \\ & std & (16.41) & (13.03) \\ Income & mean & 531970 & 72653.7 \\ & std & (875272.44) & (79690.18) \\ \bottomrule \end{tabular}

I just ran into a similar problem and found your post, so here's how I dealt with the issues you mentioned.

Problem 1: Hide second index column

I prefer solution b), but leave a) here for illustrative purposes.

a) droplevel & set_index

df_desc.index.droplevel(level=1)

>>>
Index(['Age', 'Age', 'Income', 'Income'], dtype='object')

Use this piece of code along with a set_index expression:

df_desc.set_index(df_desc.index.droplevel(level=1), inplace=True)

This results in:

print(df_desc)

>>>
                  A           B
Age         17.1527     23.9678
Age         (19.73)     (12.01)
Income       293271     12178.8
Income  (400059.27)  (14483.35)

which doesn't look satisfying yet, as the index values Age and Income appear twice, respectively.

That's why I came up with the following.

b) Create new DataFrame using pd.DataFrame.values and assign index manually

First, reset index:

df_desc = df_desc.reset_index(drop=True)

print(df_desc)

>>>
            A           B
0      17.306      11.425
1     (14.40)     (16.67)
2     88016.7     67280.4
3  (73054.44)  (54953.69)

Second, create new DataFrame specifying the index and column names manually. Note that I used df_desc.values as the data argument (first position).

df_new = pd.DataFrame(df_desc.values, index=["Age", "", "Income", ""], columns=["A", "B"])

print(df_new)

>>>
                 A           B
Age        27.7039     20.8031
           (13.99)     (13.92)
Income     20690.7     7370.44
        (29470.03)  (13279.10)

Problem 2: Align LaTeX table

Note that running

df_new.to_latex()

indeed produces a somewhat messy str output:

>>> 
'\\begin{tabular}{lll}\n\\toprule\n{} &           A &           B \\\\\n\\midrule\nAge    &     27.7039 &     20.8031 \\\\\n       &     (13.99) &     (13.92) \\\\\nIncome &     20690.7 &     7370.44 \\\\\n       &  (29470.03) &  (13279.10) \\\\\n\\bottomrule\n\\end{tabular}\n'

However, wrapping it inside a print statement produces the desired output:

print(df_new.to_latex())

>>>
\begin{tabular}{lll}
\toprule
{} &           A &           B \\
\midrule
Age    &     27.7039 &     20.8031 \\
       &     (13.99) &     (13.92) \\
Income &     20690.7 &     7370.44 \\
       &  (29470.03) &  (13279.10) \\
\bottomrule
\end{tabular}

Moreover, exporting the table to a LaTeX document is fairly simple.

As you noted yourself, to_latex() already creates a tabular, so you just need to write that to a file, and use \input in your LaTeX document. Following the example here, do the following:

i) Save the table as a text file

with open('mytable.tex','w') as tf:
    tf.write(df_new.to_latex())

ii) Use the exported table in a LaTeX document

\documentclass{article}
\usepackage{booktabs}
\begin{document}
\input{mytable}
\end{document}

This example here assumes that mytable.tex and the LaTeX document are in the same folder. The booktabs package is required, since to_latex() uses the booktabs commands for table rules.

The final pdf output looks like this:

Pandas:

My current solution:

Solution(s) ?

Recommended topics

Hot tags