Fit a power law to empirical data in Python
Asked Answered
F

2

8

I'm experimenting with fitting a power law to empirical data using the powerlaw module. I have created the following data that follows a power law distribution of exponent 2:

x = range(1,1000)
y = []

for i in x:
    y.append(i**(-2))

I'm expecting the fitted power law to have an exponent of 2. However the resulting exponent deviates from the theoretical value a lot:

    fitted_pl = powerlaw.Fit(y)

    fitted_pl.alpha
    Out[115]: 1.4017584065981563

Could you please advise why this happens, or point out what I've done wrong here?

Thank you for your kind answer!

Forestall answered 11/8, 2013 at 0:39 Comment(3)
When you write y.append(x**(-2)), I think you mean y.append(i**(-2))Mammalian
Are you maybe confusing regression of the line y(x) = k x^(-a) with fitting an exponent to values drawn from the probability distribution p(x) ~ (a-1) x^(-a)? [The k->a change is intentional.] The powerlaw module addresses the second question.Durango
@DSM, you are absolutely right! I was confusing the two very different tasks. Thank you so much for pointing this out!Forestall
F
11

As @DSM pointed out, the powerlaw module deals with fitting an exponent to values drawn/generated from a power law distribution, rather than fitting a regression. To help people who might have similar confusions, below is how one should verify the exponent fitting:

## use a proper power law random number generator (or code your own) 
from networkx.utils import powerlaw_sequence
pl_sequence = powerlaw_sequence(1000,exponent=2.5)

fitted_pl = powerlaw.Fit(pl_sequence)

fitted_pl.alpha
Out[73]: 2.4709012785346314  ##close enough
Forestall answered 12/8, 2013 at 6:43 Comment(0)
M
0

powerlaw for 0≤x≤1, a>0. With your code you probably meant expon

x = range(1,1000)
y = [i**(-2) for i in x]
plt.plot(x,y)
plt.show()

powerlaw distribution of RV :

##https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.powerlaw.html
import numpy as np
from scipy.stats import powerlaw
from scipy.optimize import minimize
import matplotlib.pyplot as plt

a= 0.8
bins= 100

fig, ax = plt.subplots(1, 1)
xx = np.linspace(powerlaw.ppf(0.01, a),
             powerlaw.ppf(0.99, a), 1000)
ax.plot(xx, powerlaw.pdf(xx, a), 'k-', lw=2, label='frozen pdf')

# alt
##rv = powerlaw(a)
##ax.plot(xx, rv.pdf(xx), 'b-', lw=2, label='frozen pdf')

# random numbers
rvs_ = powerlaw.rvs(a, size=1000)
ax.hist(rvs_, bins, density=True, histtype='stepfilled',  label='RV', alpha=0.2)

plt.legend()
plt.show()
print()
################## MLE_(NegLL)_fit -- not very good

# Python
def dpower(params, x):
    return params[0] * x**(-params[0])

from scipy.stats import norm
def negloglik(params, xx, y):
##    ll = norm.logpdf(y-dpower(params, xx), *params).sum()
    beta = params[:-1]
    sigma = params[-1]
    mu= []
    for i in np.arange(len(y)):
        mu.append(dpower(params, xx))
##    mu = np.dot(x,beta)
##    ll = -N/2 * np.log(2*np.pi*sigma**2) - (1/(2*sigma**2)) * np.sum((y - mu)**2)
    ll = norm.logpdf(y,mu,sigma).sum()
    return -1 * ll

res = minimize(negloglik, ( 1., 1.), args= (xx, rvs_), method='Nelder-Mead', options={'disp': False}, tol=1.6e-6)     #  (0, 0),
params = res.x
print(res)
##       success: True
##             x: array([0.26460641, 0.3317988 ])

fig, ax = plt.subplots(1, 1)

ax.plot(xx, dpower(params, xx),
       'r-', lw=5, alpha=0.6, label='pareto pdf')
ax.hist(rvs_, bins, density=True, histtype='stepfilled', alpha=0.2)
plt.show()

##An example power-law graph that demonstrates ranking of popularity. To the right is the long tail, and to the left are the few that dominate (also known as the 80–20 rule).
print()

P.S. I'm not sure, but it seems for me that SUM OF -1*logpdf(y, *params) can also be treated as Entropy for discrete RV, while entropy-method as (Differential) entropy of the continuous RV

p.s. or try curve_fit to find params of your distribution

Mislike answered 26/8 at 17:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.