import numpy as np
import pandas as pd
np.random.seed(2016)
N = 4393476
df = pd.DataFrame(np.random.uniform(1e-4, 0.1, size=(N,3)), columns=list('ABC'))
desc = df.describe()
desc.loc['count'] = desc.loc['count'].astype(int).astype(str)
desc.iloc[1:] = desc.iloc[1:].applymap('{:.6f}'.format)
print(desc)
yields
A B C
count 4393476 4393476 4393476
mean 0.050039 0.050056 0.050057
std 0.028834 0.028836 0.028849
min 0.000100 0.000100 0.000100
25% 0.025076 0.025081 0.025065
50% 0.050047 0.050050 0.050037
75% 0.074987 0.075027 0.075055
max 0.100000 0.100000 0.100000
Under the hood, DataFrames are organized in columns. The values in a column can only have one data type (the column's dtype
).
The DataFrame returned by df.describe()
has columns of floating-point dtype:
In [116]: df.describe().info()
<class 'pandas.core.frame.DataFrame'>
Index: 8 entries, count to max
Data columns (total 3 columns):
A 8 non-null float64
B 8 non-null float64
C 8 non-null float64
dtypes: float64(3)
memory usage: 256.0+ bytes
DataFrames do not allow you to treat one row as integers and the other rows as floats.
However, if you change the contents of the DataFrame to strings, then you have full control over the way the values are displayed
since all the values are just strings.
Thus, to create a DataFrame in the desired format, you could use
desc.loc['count'] = desc.loc['count'].astype(int).astype(str)
to convert the count
row to integers (by calling astype(int)
), and then convert the integers to strings (by calling astype(str)
). Then
desc.iloc[1:] = desc.iloc[1:].applymap('{:.6f}'.format)
converts the rest of the floats to strings using the str.format
method to format the floats to 6 digits after the decimal point.
Alternatively, you could use
import numpy as np
import pandas as pd
np.random.seed(2016)
N = 4393476
df = pd.DataFrame(np.random.uniform(1e-4, 0.1, size=(N,3)), columns=list('ABC'))
desc = df.describe().T
desc['count'] = desc['count'].astype(int)
print(desc)
which yields
count mean std min 25% 50% 75% max
A 4393476 0.050039 0.028834 0.0001 0.025076 0.050047 0.074987 0.1
B 4393476 0.050056 0.028836 0.0001 0.025081 0.050050 0.075027 0.1
C 4393476 0.050057 0.028849 0.0001 0.025065 0.050037 0.075055 0.1
By transposing the desc
DataFrame, the count
s are now in their own column.
So now the problem can be solved by converting that column's dtype to int
.
One advantage of doing it this way is that the values in desc
remain numerical.
So further calculations based on the numeric values can still be done.
I think this solution is preferrable, provided that the transposed format is acceptable.
print(data.describe().astype(int))
– Hautevienne