Pandas: normalize values by group
Asked Answered
S

2

5

I find it hard to explain with words what I want to achieve, so please don't judge me for showing a simple example instead. I have a table that looks like this:

main_col some_metadata value
this True 10
this False 3
that True 50
that False 10
other True 20
other False 5

I want to normalize this data separately for each case of main_col. For example, if we're to choose min-max normalization and scale it to range [0; 100], I want the output to look like this:

main_col some_metadata value (normalized)
this True 100
this False 30
that True 100
that False 20
other True 100
other False 25

Where for each case of main_col, the highest value is scaled to 100 and another value is scaled in respective proportion.

Sphenoid answered 27/9, 2022 at 13:51 Comment(0)
G
5

You can use groupby.transform('max') to get the max per group, then normalize in place:

df['value'] /= df.groupby('main_col')['value'].transform('max').div(100)

or:

df['value'] *= df.groupby('main_col')['value'].transform('max').rdiv(100)

output:

  main_col  some_metadata  value
0     this           True  100.0
1     this          False   30.0
2     that           True  100.0
3     that          False   20.0
4    other           True  100.0
5    other          False   25.0
Grube answered 27/9, 2022 at 13:59 Comment(3)
I've just realized both of our solutions assume the numbers are positive. In case of any negative numbers, the result can be highly skewed. For instance if this False is -10 instead of 3 the result will be -100 instead of 0.Selda
@NuriTaş I don't think this should be the case as it's not a min/max scaling but a max scaling, 0 being the implicit min.Grube
@nuri-taş that's true. But if there were negative values it's not the implementation that would be wrong, this whole normalization just wouldn't make any sense.Sphenoid
H
1

The normalization formula you are looking for is 100 * (x / x.max()):

df.groupby(['main_col'])['value'].transform(lambda x: 100 * (x / x.max()))

Result:

0    100.0
1     30.0
2    100.0
3     20.0
4    100.0
5     25.0
Name: value, dtype: float64
Hunks answered 27/9, 2022 at 14:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.