How to find a best fit distribution function for a list of data?
Asked Answered
R

4

6

I am aware of many probabilistic functions builted-in Python, with the random module.

I'd like to know if, given a list of floats, it would be possible to find the distribution equation that best fits the list?

I don't know if numpy does it, but this function could be compared (not equal, but similar) with the Excel's "Trend" function.

How would I do that?

Recessional answered 23/4, 2011 at 5:43 Comment(2)
A distribution isn't a best fit curve. It has a max of 1 and a min of 0, and an integral from -inf to inf which equals 1. What are you trying to do with this curve?Coltson
@Coltson I have, for example, 10 types of operations (work with a vessel). And a history of 10 years of work with this types of operations. I'm trying to simulate the next year of opearations, based on the history, but applying random generators for each type of operation. This random generator will be applied to the duration of each request, the date when each request occurs and how many requests will happen. And I'm not interested on user enter this data. Because of that I'm looking for a way to automatically entry these random distributions.Recessional
P
13

Look at numpy.polyfit

numpy.polyfit(x, y, deg, rcond=None, full=False)

Least squares polynomial fit.

Fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to points (x, y). Returns a vector of coefficients p that minimises the squared error.

Prehistoric answered 23/4, 2011 at 6:20 Comment(1)
Could you look at my comment to see if you have another tip?Recessional
G
1

there's also curve_fit

from scipy.optimize import curve_fit
Gally answered 9/1, 2022 at 4:38 Comment(2)
This would be great if it had an exampleKarnak
like here is a good example: #53966537Karnak
G
0

You may want to try the time series analysis in statsmodels.tsa. Check out the code below:

from statsmodels.tsa.seasonal import seasonal_decompose
decomp = seasonal_decompose(df_train)

trend = decomp.trend
seasonal = decomp.seasonal
residual = decomp.resid

One caveat. I found the seasonal part not to handle heterostascedy well -- this si when your periodic function amplitude grows with time. It keeps the periodic amplitude constant (that is part of seasonal) and then your residual will show a periodic effect.

Gally answered 8/1, 2022 at 18:28 Comment(0)
G
0

Take a look at the documentation of https://erdogant.github.io/distfit/pages/html/Plots.html#plot-all-fitted-distributions

They have exactly what you need by testing multiple distribution functions and ranking the best models

Gayn answered 8/4 at 21:2 Comment(1)
While the information in your link may solve the OP's question, answers that rely heavily on links are discouraged for these reasons. Please consider updating your answer so that it is self-contained, with the minimal amount of code required to demonstrate how it works. ThanksColan

© 2022 - 2024 — McMap. All rights reserved.