Pandas groupby to to_csv
Asked Answered
Y

7

34

Want to output a Pandas groupby dataframe to CSV. Tried various StackOverflow solutions but they have not worked.

Python 3.6.1, Pandas 0.20.1

groupby result looks like:

id  month   year    count
week                
0   9066    82  32142   895
1   7679    84  30112   749
2   8368    126 42187   872
3   11038   102 34165   976
4   8815    117 34122   767
5   10979   163 50225   1252
6   8726    142 38159   996
7   5568    63  26143   582

Want a csv that looks like

week  count
0   895
1   749
2   872
3   976
4   767
5   1252
6   996
7   582

Current code:

week_grouped = df.groupby('week')
week_grouped.sum() #At this point you have the groupby result
week_grouped.to_csv('week_grouped.csv') #Can't do this - .to_csv is not a df function. 

Read SO solutions:

output groupby to csv file pandas

week_grouped.drop_duplicates().to_csv('week_grouped.csv')

Result: AttributeError: Cannot access callable attribute 'drop_duplicates' of 'DataFrameGroupBy' objects, try using the 'apply' method

Python pandas - writing groupby output to file

week_grouped.reset_index().to_csv('week_grouped.csv')

Result: AttributeError: "Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method"

Yahweh answered 1/12, 2017 at 22:2 Comment(1)
If you landed up here wanting to know how to save each individual groupby to its own CSV file, see this answer.Torpid
J
30

Try doing this:

week_grouped = df.groupby('week')
week_grouped.sum().reset_index().to_csv('week_grouped.csv')

That'll write the entire dataframe to the file. If you only want those two columns then,

week_grouped = df.groupby('week')
week_grouped.sum().reset_index()[['week', 'count']].to_csv('week_grouped.csv')

Here's a line by line explanation of the original code:

# This creates a "groupby" object (not a dataframe object) 
# and you store it in the week_grouped variable.
week_grouped = df.groupby('week')

# This instructs pandas to sum up all the numeric type columns in each 
# group. This returns a dataframe where each row is the sum of the 
# group's numeric columns. You're not storing this dataframe in your 
# example.
week_grouped.sum() 

# Here you're calling the to_csv method on a groupby object... but
# that object type doesn't have that method. Dataframes have that method. 
# So we should store the previous line's result (a dataframe) into a variable 
# and then call its to_csv method.
week_grouped.to_csv('week_grouped.csv')

# Like this:
summed_weeks = week_grouped.sum()
summed_weeks.to_csv('...')

# Or with less typing simply
week_grouped.sum().to_csv('...')
Jayejaylene answered 1/12, 2017 at 22:15 Comment(3)
Thanks! - Why does it work when sum() is part of the the to_csv statement but not when sum() is done on its own line?Yahweh
@kalmdown, if this answered your question, can you please mark it as so? Click the check mark to make it green.Jayejaylene
@kalmdown, did my reply answer your question? My answer still hasn't been marked as accepted.Jayejaylene
T
9

Group By returns key, value pairs where key is the identifier of the group and the value is the group itself, i.e. a subset of an original df that matched the key.

In your example week_grouped = df.groupby('week') is set of groups (pandas.core.groupby.DataFrameGroupBy object) which you can explore in detail as follows:

for k, gr in week_grouped:
    # do your stuff instead of print
    print(k)
    print(type(gr)) # This will output <class 'pandas.core.frame.DataFrame'>
    print(gr)
    # You can save each 'gr' in a csv as follows
    gr.to_csv('{}.csv'.format(k))

Or alternatively you can compute aggregation function on your grouped object

result = week_grouped.sum()
# This will be already one row per key and its aggregation result
result.to_csv('result.csv') 

In your example you need to assign the function result to some variable as by default pandas objects are immutable.

some_variable = week_grouped.sum() 
some_variable.to_csv('week_grouped.csv') # This will work

basically result.csv and week_grouped.csv are meant to be same

Trifurcate answered 1/12, 2017 at 22:55 Comment(1)
Thank you for the indepth explanation. Helps to understand the system instead of just the problem.Yahweh
S
4

Try changing your second line to week_grouped = week_grouped.sum() and re-running all three lines.

If you run week_grouped.sum() in its own Jupyter notebook cell, you'll see how the statement returns the output to the cell's output, instead of assigning the result back to week_grouped. Some pandas methods have an inplace=True argument (e.g., df.sort_values(by=col_name, inplace=True)), but sum does not.

EDIT: does each week number only appear once in your CSV? If so, here's a simpler solution that doesn't use groupby:

df = pd.read_csv('input.csv')
df[['id', 'count']].to_csv('output.csv')
Stair answered 1/12, 2017 at 22:9 Comment(2)
In the original data the week appears on multiple rows. In this case the groupby is being used to gather the weeks together so a count can be done per week.Yahweh
BTW - Thanks so much for the explanation of why sum is an issue.Yahweh
T
3

To complete the nice @AlexLuisArias answer :
We can now include a as_index parameter directly in the groupby to avoid the reset_index before the to_csv like so :

week_grouped = df.groupby('week', as_index=False)
week_grouped.sum().to_csv('week_grouped.csv')

It feels even more elegant.

Talton answered 9/11, 2022 at 13:28 Comment(0)
B
1

I feel that there is no need to use a groupby, you can just drop the columns you do not want too.

df = df.drop(['month','year'], axis=1)
df.reset_index()
df.to_csv('Your path')
Brack answered 1/12, 2017 at 22:15 Comment(1)
Should be "axis=1"...But yes that will output the rows but not grouped by week or state.Yahweh
W
1

Pandas groupby generates a lot of information (count, mean, std, ...). If you want to save all of them in a csv file, first you need to convert it to a regular Dataframe:

import pandas as pd
...
...
MyGroupDataFrame = MyDataFrame.groupby('id')
pd.DataFrame(MyGroupDataFrame.describe()).to_csv("myTSVFile.tsv", sep='\t', encoding='utf-8')
Win answered 17/3, 2020 at 13:3 Comment(0)
M
1

##Hey, I just discovered this!! We can also try slicing the groupby result and read it in a csv. try this:##

week_grouped = df.groupby('week')
length=len(week_grouped)
week_grouped[0:length].to_csv("results.csv")
Mortie answered 24/9, 2021 at 12:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.