Simpson's Rule Integration Negative Area
Asked Answered
J

2

6

I am having a problem when using simpson's rule from scipy.integrate library. The Area calculated sometimes is negative even if all the numbers are positive and the values on the x-axis are increasing from left to right. For example:

from scipy.integrate import simps

x = [0.0, 99.0, 100.0, 299.0, 400.0, 600.0, 1700.0, 3299.0, 3300.0, 3399.0, 3400.0, 3599.0, 3699.0, 3900.0,
    4000.0, 4300.0, 4400.0, 4900.0, 5000.0, 5100.0, 5300.0, 5500.0, 5700.0, 5900.0, 6100.0, 6300.0, 6600.0,
    6900.0, 7200.0, 7600.0, 7799.0, 8000.0, 8400.0, 8900.0, 9400.0, 10000.0, 10600.0, 11300.0, 11699.0,
    11700.0, 11799.0]

y = [3399.68, 3399.68, 3309.76, 3309.76, 3274.95, 3234.34, 3203.88, 3203.88, 3843.5,
     3843.5,  4893.57, 4893.57, 4893.57, 4847.16, 4764.49, 4867.46, 4921.13, 4886.32,
     4761.59, 4731.13, 4689.07, 4649.91, 4610.75, 4578.84, 4545.48, 4515.02, 4475.86,
     4438.15, 4403.34, 4364.18, 4364.18, 4327.92, 4291.66, 4258.31, 4226.4,  4188.69,
     4152.43, 4120.52, 4120.52, 3747.77, 3747.77]

area = simps(y,x)

The result returned by simps(y,x) is -226271544.06562585. Why is it negative? This happens only in some cases while in other cases it works fine. For example:

x = [0.0, 100.0, 101.0, 200.0, 300.0, 400.0, 500.0, 600.0, 700.0, 1300.0, 3300.0, 3400.0, 3600.0, 3700.0,
    5100.0, 5200.0, 5400.0, 5600.0, 5800.0, 6000.0, 6200.0, 6400.0, 6600.0, 6900.0, 7200.0, 7500.0, 7900.0,
    8299.0, 8400.0, 8900.0, 9400.0, 10000.0, 10600.0, 11200.0, 11900.0, 12600.0, 13500.0, 14300.0, 15300.0,
    16400.0, 16499.0, 17500.0, 18900.0, 20100.0, 20999.0, 21000.0, 21099.0]

y = [2813.73, 2813.73, 3200.98, 3309.76, 3356.17, 3296.71, 3243.04, 3243.04, 3198.08, 3161.82, 3488.16,
     4929.83, 4897.92, 4897.92, 4763.04, 4726.78, 4680.37, 4638.31, 4597.69, 4561.44, 4525.18, 4494.72,
     4464.26, 4426.55, 4388.84, 4354.03, 4316.32, 4316.32, 4275.71, 4239.45, 4203.19, 4171.28, 4136.47,
     4104.57, 4074.11, 4042.2, 4011.74, 3979.83, 3949.38, 3918.92, 3918.92, 3887.01, 3855.1, 3824.64,
     3824.64,3605.64, 3605.64]

area = simps(y,x)

The area in this case is positive 83849670.99112588.

What is the reason of this?

Johnathon answered 18/9, 2019 at 16:7 Comment(1)
This was also a github issue which I spent some time on, not realising there was the same answer here: see Is it bad form to cross post to Stack Overflow and GitHub? (answer: maybe not, but maybe disclose the fact you have done so on both sites)Prototherian
C
8

The problem is how simpson works, it makes an estimate of the best possible quadratic function, with some data like yours, in which there is an almost vertical zone, the operation is wrong.

import numpy as np
from scipy.integrate import simps, trapz
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a + b * x + c * x ** 2

x = np.array([0.0, 99.0, 100.0, 299.0, 400.0, 600.0, 1700.0, 3299.0, 3300.0, 3399.0, 3400.0, 3599.0, 3699.0, 3900.0,
    4000.0, 4300.0, 4400.0, 4900.0, 5000.0, 5100.0, 5300.0, 5500.0, 5700.0, 5900.0, 6100.0, 6300.0, 6600.0,
    6900.0, 7200.0, 7600.0, 7799.0, 8000.0, 8400.0, 8900.0, 9400.0, 10000.0, 10600.0, 11300.0, 11699.0,
    11700.0, 11799.0])

y = np.array([3399.68, 3399.68, 3309.76, 3309.76, 3274.95, 3234.34, 3203.88, 3203.88, 3843.5,
     3843.5,  4893.57, 4893.57, 4893.57, 4847.16, 4764.49, 4867.46, 4921.13, 4886.32,
     4761.59, 4731.13, 4689.07, 4649.91, 4610.75, 4578.84, 4545.48, 4515.02, 4475.86,
     4438.15, 4403.34, 4364.18, 4364.18, 4327.92, 4291.66, 4258.31, 4226.4,  4188.69,
     4152.43, 4120.52, 4120.52, 3747.77, 3747.77])

for i in range(3,len(x)):
    popt, _ = curve_fit(func, x[i-3:i], y[i-3:i])
    xnew = np.linspace(x[i-3], x[i-1], 100)
    plt.plot(xnew, func(xnew, *popt), 'k-')

plt.plot(x, y)
plt.show()

Black line Simps estimates curve

Points detail

Cacia answered 18/9, 2019 at 17:15 Comment(4)
So what is your suggestion to calculate the area for these points? Do you know any other libraries that would be more suited for this kind of data?Johnathon
Use this: docs.scipy.org/doc/scipy/reference/generated/…Cacia
This is a very nice set of points to show that Simpson's rule is not always good. Is it "real life" data ? It seemed especially made for this demonstration:-). By the way, the 2nd set of points has a positive area, but also suffers from the oscillating interpolation, so the integral is wrong too. The "correct/best" integration method mostly depends on where the series comes from, and what the values in between your samples (should) look like. If your data has a good reason to be piece-wise linear, then trapz will be good.Gaytan
Yes they were sensor data, so they are kind of linear with some occasional strong variations. So i think trapz will work fine for itJohnathon
G
2

Your samples have a very strong variation and x are not equally spaced. Could it be something like Runge's phenomenon? trapz would be more accurate ?

Gaytan answered 18/9, 2019 at 17:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.