Pandas finding local max and min

T

4

56

I have a pandas data frame with two columns one is temperature the other is time.

I would like to make third and fourth columns called min and max. Each of these columns would be filled with nan's except where there is a local min or max, then it would have the value of that extrema.

Here is a sample of what the data looks like, essentially I am trying to identify all the peaks and low points in the figure.

Are there any built in tools with pandas that can accomplish this?

Twylatwyman answered 29/12, 2017 at 14:19 Comment(3)

Should the result be robust against noise? Otherwise, you could just compare the values of the Series to its shifts. – Defector 29/12, 2017 at 14:24

I'm not worried about noise in this case, if it were a noisy signal I would just filter then look for max/min on the filter result – Twylatwyman 29/12, 2017 at 14:27

You could alternatively fit a very simple (e.g. linear with one or two covariates) model to the data, and then from the residual terms keep those whose deviations are in the q% smallest or largest categories, using pd.quantile. – Boron 29/12, 2017 at 14:50

D

56

Assuming that the column of interest is labelled data, one solution would be

df['min'] = df.data[(df.data.shift(1) > df.data) & (df.data.shift(-1) > df.data)]
df['max'] = df.data[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)]

For example:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Generate a noisy AR(1) sample
np.random.seed(0)
rs = np.random.randn(200)
xs = [0]
for r in rs:
    xs.append(xs[-1]*0.9 + r)
df = pd.DataFrame(xs, columns=['data'])

# Find local peaks
df['min'] = df.data[(df.data.shift(1) > df.data) & (df.data.shift(-1) > df.data)]
df['max'] = df.data[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)]

# Plot results
plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='g')
df.data.plot()

Defector answered 29/12, 2017 at 14:33 Comment(3)

I found that when the values of data are repeated for example multiple rows with the value 7, using just < or > would miss the data point as a 'min' or a 'max'. Modifying this solution to have ".shift(1) <=" and ".shift(1) >=" did in fact allow for the identification of 'min' and 'max' values for repeated values. The logic is that the final row containing the repeated value will be treated as the 'min' or 'max'. – Amontillado 7/5, 2020 at 3:40

great findins Udesh – Ordination 11/5, 2021 at 3:26

Great solution! – Pangaro 6/4, 2022 at 23:34

E

128

The solution offered by fuglede is great but if your data is very noisy (like the one in the picture) you will end up with lots of misleading local extremes. I suggest that you use scipy.signal.argrelextrema() method. The .argrelextrema() method has its own limitations but it has a useful feature where you can specify the number of points to be compared, kind of like a noise filtering algorithm. for example:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.signal import argrelextrema

# Generate a noisy AR(1) sample

np.random.seed(0)
rs = np.random.randn(200)
xs = [0]
for r in rs:
    xs.append(xs[-1] * 0.9 + r)
df = pd.DataFrame(xs, columns=['data'])

n = 5  # number of points to be checked before and after

# Find local peaks

df['min'] = df.iloc[argrelextrema(df.data.values, np.less_equal,
                    order=n)[0]]['data']
df['max'] = df.iloc[argrelextrema(df.data.values, np.greater_equal,
                    order=n)[0]]['data']

# Plot results

plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='g')
plt.plot(df.index, df['data'])
plt.show()

Some points:

you might need to check the points afterward to ensure there are no twine points very close to each other.
you can play with n to filter the noisy points
argrelextrema returns a tuple and the [0] at the end extracts a numpy array

Elayne answered 13/6, 2018 at 11:44 Comment(6)

This is good solution. I wrote a small blogpost about it: eddwardo.github.io/pandas/timeseries/2019/06/05/… – Winters 5/6, 2019 at 18:14

Excellent blog post @Winters , that really helped me understand it! – Protein 11/11, 2019 at 14:8

@Winters the page is down 😔 – Elayne 15/12, 2020 at 14:47

@Foad eddwardo.github.io/posts/… – Winters 17/12, 2020 at 23:53

The best solution as well as the fastest. Did not know about argrelextrema – Bawdy 25/9, 2021 at 21:37

Great idea, but note that this solution seems to have issues when minima/maxima are numerically identical. For instance, np.less_equal could result in detecting them all, whereas np.less could result in not detecting them at all. See this question. – Fenton 30/12, 2022 at 9:53

D

56

Assuming that the column of interest is labelled data, one solution would be

df['min'] = df.data[(df.data.shift(1) > df.data) & (df.data.shift(-1) > df.data)]
df['max'] = df.data[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)]

For example:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Generate a noisy AR(1) sample
np.random.seed(0)
rs = np.random.randn(200)
xs = [0]
for r in rs:
    xs.append(xs[-1]*0.9 + r)
df = pd.DataFrame(xs, columns=['data'])

# Find local peaks
df['min'] = df.data[(df.data.shift(1) > df.data) & (df.data.shift(-1) > df.data)]
df['max'] = df.data[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)]

# Plot results
plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='g')
df.data.plot()

Defector answered 29/12, 2017 at 14:33 Comment(3)

I found that when the values of data are repeated for example multiple rows with the value 7, using just < or > would miss the data point as a 'min' or a 'max'. Modifying this solution to have ".shift(1) <=" and ".shift(1) >=" did in fact allow for the identification of 'min' and 'max' values for repeated values. The logic is that the final row containing the repeated value will be treated as the 'min' or 'max'. – Amontillado 7/5, 2020 at 3:40

great findins Udesh – Ordination 11/5, 2021 at 3:26

Great solution! – Pangaro 6/4, 2022 at 23:34

S

6

You can do something similar to Foad's .argrelextrema() solution, but with the Pandas .rolling() function:

# Find local peaks
n = 5 #rolling period
local_min_vals = df.loc[df['data'] == df['data'].rolling(n, center=True).min()]
local_max_vals = df.loc[df['data'] == df['data'].rolling(n, center=True).max()]

plt.scatter(local_min_vals.index, local_min_vals, c='r')
plt.scatter(local_max_vals.index, local_max_vals, c='g')

Sumach answered 30/12, 2022 at 18:25 Comment(0)

S

3

using Numpy

ser = np.random.randint(-40, 40, 100) # 100 points
peak = np.where(np.diff(ser) < 0)[0]

or

double_difference = np.diff(np.sign(np.diff(ser)))
peak = np.where(double_difference == -2)[0]

using Pandas

ser = pd.Series(np.random.randint(2, 5, 100))
peak_df = ser[(ser.shift(1) < ser) & (ser.shift(-1) < ser)]
peak = peak_df.index

Sienna answered 27/12, 2019 at 22:39 Comment(0)

Recommended topics

Hot tags