Pandas df.describe() - how do I extract values into Dataframe?

Asked 27/1, 2019 at 22:45 Answered 25/10, 2022 at 15:36

Solved python pandas dataframe data-science

I am trying to do a naive Bayes and after loading some data into a dataframe in Pandas, the describe function captures the data I want. I'd like to capture the mean and std from each column of the table but am unsure on how to do that. I've tried things like:

df.describe([mean])
df.describe(['mean'])
df.describe().mean

None are working. I was able to do something similar in R with summary but don't know how to do in Python. Can someone lend some advice?

Permenter answered 27/1, 2019 at 22:45 Comment(0)

Please try something like this:

df.describe(include='all').loc['mean']

Larrabee answered 27/1, 2019 at 22:49 Comment(2)

Works like a charm. Looks like I can capture it as a variable as well. What if you want two items like mean and std? – Permenter 27/1, 2019 at 22:51

df.describe(include='all').loc[['mean','std']] – Larrabee 27/1, 2019 at 23:9

You were close. You don't need any include tag. Just rewrite your second approach correctly: df.describe()['mean']

For example:

import pandas as pd

s = pd.Series([1, 2, 3, 4, 5])
s.describe()['mean']
# 3.0

If you want both mean and std, just write df.describe()[['mean', 'std']]. For example,

s.describe()[['mean', 'std']]
# mean    3.000000
# std     1.581139
# dtype: float64

Projection answered 27/1, 2019 at 22:51 Comment(11)

I'm getting an error that says: KeyError: "['mean' 'std'] not in index". Any idea why that would occur? – Permenter 27/1, 2019 at 22:59

@Vaslo: You missed a comma between 'mean' and 'std'. Try again with a comma – Projection 27/1, 2019 at 23:1

If still the problem persists, please include some dataframe in your question – Projection 27/1, 2019 at 23:2

I think the issue is that I am trying to use on a 2D frame. When I use your example it works fine but when I try to do it exactly cut and paste as you explain it is giving me an error. – Permenter 27/1, 2019 at 23:6

@Vaslo: Can you try df_1 = pd.Series(df.values.ravel()) and then try df_1.describe()[['mean', 'std']]? – Projection 27/1, 2019 at 23:8

When i use the example you posted it works and gives me a single mean and single std. Mine should have a column of means and stds (one for each column) – Permenter 27/1, 2019 at 23:9

Try what I wrote in my comment before uploading the dataframe and see if it works – Projection 27/1, 2019 at 23:11

I tried the df_1 = pd.Series(df.values.ravel()) and it works, but returns just a single std and mean, so maybe that is my issue? I have a 767x9 dataframe that I am trying to extract 9 means from. – Permenter 27/1, 2019 at 23:13

Many thanks - the solution below by @Larrabee gives me a subset of the describe dataframe. For future reference, how do I upload a frame, or do I just type it into the space? – Permenter 27/1, 2019 at 23:16

You can copy paste the frame after printing it using df.head() for example. People can then simply copy your dataframe and then use pd.read_clipboard() to create a dataframe out of it. To get more idea, just click on the pandas or dataframe tag below your question and then see how other question have done it. – Projection 27/1, 2019 at 23:18

I had to use python 2.7 to use some other libraries at work and I had to use include='all' to get it working. – Verge 22/10, 2019 at 14:5

If you further want to extract specific column data then try:

df.describe()['FeatureName']['mean']

Replace mean with any other statistic you want to extract

Clintonclintonia answered 28/1, 2021 at 14:53 Comment(0)

You can try:

import numpy as np 
import pandas as pd
data = pd.read_csv('./FileName.csv')
data.describe().loc['mean']

Grani answered 11/5, 2022 at 14:54 Comment(0)

If you want the mean or the std of a column of your dataframe, you don't need to go through describe(). Instead, the proper way would be to just call the respective statistical function on the column (which really is a pandas.core.series.Series). Here is an example:

import pandas as pd

# crate dataframe with some numerical data
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                   'B': [8, 7, 6, 5, 4, 3, 2, 1, 0, 0]})

print(df['A'].mean()) # 5.5
print(df['B'].std())  # 2.8751811537130436

See here for the descriptive stats that are built into the pandas Series.

(Let me know if I am misunderstanding what you are trying to do here.)

Urbas answered 25/10, 2022 at 15:36 Comment(0)

-1

yeah bro i am faced same problem after seeing these solutions i tried it.luckly one get worked.here i worked on the 75% in describe function this is my coded=bank.groupby(by=['region','Gender']).get_group(('south Moravia','Female')) d.cashwdn.describe()['75%']

Calvinna answered 23/10, 2022 at 7:8 Comment(1)

I think your answer is both lacking clarity and it is falling short in language/tone. – Urbas 25/10, 2022 at 15:40

Recommended topics

Hot tags