Pandas: Get top 10 values AFTER grouping
Asked Answered
H

2

1

I have a pandas data frame with a column 'id' and a column 'value'. It is already sorted by first id (ascending) and then value (descending). What I need is the top 10 values per id.

I assumed that something like the following would work, but it doesn't:

df.groupby("id", as_index=False).aggregate(lambda (index,rows) : rows.iloc[:10])

What I get is just a list of ids, the value column (and other columns that I omitted for the question) aren't there anymore.

Any ideas how it might be done, without iterating through each of the single rows and appending the first ten to another data structure?

Hague answered 16/10, 2015 at 10:36 Comment(1)
You should give example dataframe in the question.Santa
R
2

Is this what you're looking for?

df.groupby('id').head(10)
Repairer answered 16/10, 2015 at 10:41 Comment(0)
F
1

I would like to answer this by giving and example dataframe as:

df = pd.DataFrame(np.array([['a','a','b','c','a','c','b'],[4,6,1,8,9,4,1],[12,11,7,1,5,5,7],[123,54,146,96,10,114,200]]).T,columns=['item','date','hour','value'])
df['value'] = pd.to_numeric(df['value'])

This gives you a dataframe

item    date    hour    value
a   4   12  123
a   6   11  54
b   1   7   146
c   8   1   96
a   9   5   10
c   4   5   114
b   1   7   200

Now this is grouped below and displays first 2 values of grouped items.

df.groupby(['item'])['value'].head(2)
Fidler answered 16/3, 2022 at 21:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.