A new column in pandas which value depends on other columns
Asked Answered
I

4

12

I have an example data as:

datetime             col1    col2    col3
2021-04-10 01:00:00    25.    50.     50
2021-04-10 02:00:00.   25.    50.     50
2021-04-10 03:00:00.   25.    100.    50
2021-04-10 04:00:00    50.     50.    100
2021-04-10 05:00:00.   100.    100.   100

I want to create a new column called state, which returns col1 value if col2 and col3 values are less than or equal to 50 otherwise returns the max value between col1,column2 and column3.

The expected output is as shown below:

datetime             col1    col2    col3. state
2021-04-10 01:00:00    25.    50.     50.   25
2021-04-10 02:00:00.   25.    50.     50.   25
2021-04-10 03:00:00.   25.    100.    50.   100
2021-04-10 04:00:00    50.     50.    100.  100
2021-04-10 05:00:00.   100.    100.   100.  100
Infralapsarian answered 6/5, 2021 at 15:3 Comment(2)
What do you mean " the mean of max value"? The mean of a single value would just be the max value no?Selena
I meant the max value.Infralapsarian
E
18

To improve upon other answer, I would use pandas apply for iterating over rows and calculating new column.

def calc_new_col(row):
   if row['col2'] <= 50 & row['col3'] <= 50:
        return row['col1']
    else:
        return max(row['col1'], row['col2'], row['col3'])

df["state"] = df.apply(calc_new_col, axis=1)
# axis=1 makes sure that function is applied to each row

print(df)
            datetime  col1  col2  col3  state
2021-04-10  01:00:00    25    50    50     25
2021-04-10  02:00:00    25    50    50     25
2021-04-10  03:00:00    25   100    50    100
2021-04-10  04:00:00    50    50   100    100
2021-04-10  05:00:00   100   100   100    100

apply helps the code to be cleaner and more reusable.

Enlace answered 6/5, 2021 at 15:18 Comment(1)
this is by far the most pythonic and also logical way. The apply statement on row avoids looping.Medullated
N
4
# Create a mask:

# Create a mask for the basic condition
mask1 = ((df['col2'] <= 50) & (df['col3'] <= 50))

# Use loc to select rows where condition is met and input the df['col1'] value in state
df.loc[mask1, 'state'] = df['col1']

# Check for rows where condition is not met ~ does that, input the mean in state.
df.loc[~mask1, 'state'] = (df['col1'] + df['col2'] + df['col3'])/3

Nikolenikoletta answered 6/5, 2021 at 15:15 Comment(3)
This answer highlights how vectorized calculations can be done on dataframes.Enlace
It does not appear that indexes 2-4 match the expected output.Selena
@HenryEcker Contradiction in the question asked and output given in the question. I have done the best I could based on the info.Nikolenikoletta
A
1

You can iterate through the dataframe's rows and check the condition

values = []

for ind, row in df.iterrows():
    if row['col2'] <= 50 & row['col3'] <= 50:
        values.append(row['col1'])
    else:
        values.append(max(row['col1'], row['col2'], row['col3']))

df['state'] = values

print(df)
            datetime  col1  col2  col3  state
2021-04-10  01:00:00    25    50    50     25
2021-04-10  02:00:00    25    50    50     25
2021-04-10  03:00:00    25   100    50    100
2021-04-10  04:00:00    50    50   100    100
2021-04-10  05:00:00   100   100   100    100
Antimasque answered 6/5, 2021 at 15:10 Comment(0)
S
0

An option using np.where:

import numpy as np
import pandas as pd

df = pd.DataFrame({'datetime': {0: '2021-04-10 01:00:00', 1: '2021-04-10 02:00:00',
                                2: '2021-04-10 03:00:00', 3: '2021-04-10 04:00:00',
                                4: '2021-04-10 05:00:00'},
                   'col1': {0: 25.0, 1: 25.0, 2: 25.0, 3: 50.0, 4: 100.0},
                   'col2': {0: 50.0, 1: 50.0, 2: 100.0, 3: 50.0, 4: 100.0},
                   'col3': {0: 50, 1: 50, 2: 50, 3: 100, 4: 100}})

df['state'] = np.where((df['col2'] <= 50) & (df['col3'] <= 50), df.col1, df.max(axis=1))

print(df)

Output:

           datetime  col1  col2  col3  state
2021-04-10 01:00:00  25.0  50.0    50   25.0
2021-04-10 02:00:00  25.0  50.0    50   25.0
2021-04-10 03:00:00  25.0 100.0    50  100.0
2021-04-10 04:00:00  50.0  50.0   100  100.0
2021-04-10 05:00:00 100.0 100.0   100  100.0
Selena answered 6/5, 2021 at 15:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.