Pandas assign group numbers for each time bin

Key Name Val1 Val2 Timestamp 101 A 10 1 01-10-2019 00:20:21 102 A 12 2 01-10-2019 00:20:21 103 B 10 1 01-10-2019 00:20:26 104 C 20 2 01-10-2019 14:40:45 105 B 21 3 02-10-2019 09:04:06 106 D 24 3 02-10-2019 09:04:12 107 A 24 3 02-10-2019 09:04:14 108 E 32 2 02-10-2019 09:04:20 109 A 10 1 02-10-2019 09:04:22 110 B 10 1 02-10-2019 10:40:49

Key Name Val1 Val2 Timestamp Group 101 A 10 1 01-10-2019 00:20:21 1 102 A 12 2 01-10-2019 00:20:21 1 103 B 10 1 01-10-2019 00:20:26 1 104 C 20 2 01-10-2019 14:40:45 2 105 B 21 3 02-10-2019 09:04:06 3 106 D 24 3 02-10-2019 09:04:12 4 107 A 24 3 02-10-2019 09:04:14 4 108 E 32 2 02-10-2019 09:04:20 4 109 A 10 1 02-10-2019 09:04:22 5 110 B 10 1 02-10-2019 10:40:49 6

Here is an example without loop. The main approach is round up seconds to specific ranges and use ngroup().

02-10-2019 09:04:12 -> 02-10-2019 09:04:11
02-10-2019 09:04:14 -> 02-10-2019 09:04:11
02-10-2019 09:04:20 -> 02-10-2019 09:04:11
02-10-2019 09:04:21 -> 02-10-2019 09:04:21
02-10-2019 09:04:25 -> 02-10-2019 09:04:21
...

I use a new temporary column to find some specific range.

df = pd.DataFrame.from_dict({
    'Name': ('A', 'A', 'B', 'C', 'B', 'D', 'A', 'E', 'A', 'B'),
    'Val1': (1, 2, 1, 2, 3, 3, 3, 2, 1, 1),
    'Timestamp': (
        '2019-01-10 00:20:21',
        '2019-01-10 00:20:21',
        '2019-01-10 00:20:26',
        '2019-01-10 14:40:45',
        '2019-02-10 09:04:06',
        '2019-02-10 09:04:12',
        '2019-02-10 09:04:14',
        '2019-02-10 09:04:20',
        '2019-02-10 09:04:22',
        '2019-02-10 10:40:49',
    )
})
# convert str to Timestamp
df['Timestamp'] = pd.to_datetime(df['Timestamp'])

# your specific ranges. customize if you need
def sec_to_group(x):
    if 0 <= x.second <= 10:
        x = x.replace(second=0)
    elif 11 <= x.second <= 20:
        x = x.replace(second=11)
    elif 21 <= x.second <= 30:
        x = x.replace(second=21)
    elif 31 <= x.second <= 40:
        x = x.replace(second=31)
    elif 41 <= x.second <= 50:
        x = x.replace(second=41)
    elif 51 <= x.second <= 59:
        x = x.replace(second=51)
    return x


# new column formated_dt(temporary) with formatted seconds
df['formated_dt'] = df['Timestamp'].apply(sec_to_group)
# group by new column + ngroup() and drop
df['Group'] = df.groupby('formated_dt').ngroup()
df.drop(columns=['formated_dt'], inplace=True)
print(df)

Output:

#  Name  Val1           Timestamp  Group
# 0    A     1 2019-01-10 00:20:21      0  <- ngroup() calculates from 0
# 1    A     2 2019-01-10 00:20:21      0
# 2    B     1 2019-01-10 00:20:26      0
# 3    C     2 2019-01-10 14:40:45      1
# 4    B     3 2019-02-10 09:04:06      2
# ....

Also you can try to use TimeGrouper or resample.

Hope this helps.

Recommended topics

Hot tags