Pareto distribution: R vs Python - different results
Asked Answered
R

1

3

I'm trying to replicate R's fitdist() results (reference, cannot modify R code) in Python using scipy.stats. The results are totally different. Does anyone know why? How can I replicate R's results in Python?

data = [2457.145, 1399.034, 20000.0, 476743.9, 24059.6, 28862.8]

R code:

library(fitdistrplus)
library(actuar)

fitdist(data, 'pareto', "mle")$estimate

R results:

       shape        scale 
    0.760164 10066.274196

Python code

st.pareto.fit(data, floc=0, scale=1)

Python results

(0.4019785013487883, 0, 1399.0339889072732)
Rena answered 14/1, 2021 at 13:20 Comment(3)
Can you show which R libraries you are using for the implementation of the Pareto distribution and fitdist?Revolution
I use actuar and fitdistrplus librariesRena
@WarrenWeckesser PDFs seems to be a little bit different... But Is it possible to get the same PDF in Python?Rena
C
4

The discrepancy is mainly due to the differing pdfs.

Python

In python st.pareto.fit() uses a Pareto distribution defined via this pdf:

enter image description here

import scipy.stats as st
data = [2457.145, 1399.034, 20000.0, 476743.9, 24059.6, 28862.8]
print(st.pareto.fit(data, floc = 0, scale = 1))

# (0.4019785013487883, 0, 1399.0339889072732)

R

Whereas your R code is using a Pareto with this pdf:

enter image description here

library(fitdistrplus)
library(actuar)
data <- c(2457.145, 1399.034, 20000.0, 476743.9, 24059.6, 28862.8)
fitdist(data, 'pareto', "mle")$estimate

#    shape        scale 
#    0.760164 10066.274196 

Make R Mirror Python

To get R to use the same distribution as st.pareto.fit() use actuar::dpareto1():

library(fitdistrplus)
library(actuar)
data <- c(2457.145, 1399.034, 20000.0, 476743.9, 24059.6, 28862.8)
fitdist(data, 'pareto1', "mle")$estimate

#     shape          min 
#   0.4028921 1399.0284977

Make Python Mirror R

And here is one approach to approximate your R code in Python:

import numpy as np
from scipy.optimize import minimize

def dpareto(x, shape, scale):
    return shape * scale**shape / (x + scale)**(shape + 1)

def negloglik(x):
    data = [2457.145, 1399.034, 20000.0, 476743.9, 24059.6, 28862.8]
    return -np.sum([np.log(dpareto(i, x[0], x[1])) for i in data])

res = minimize(negloglik, (1, 1), method='Nelder-Mead', tol=2.220446e-16)
print(res.x)

# [7.60082820e-01 1.00691719e+04]
Cimabue answered 18/1, 2021 at 23:25 Comment(2)
Thank you! Is it a universal approach? Will python approach work for other distributions in case of different pdfs (own pdf, negloglik() and minimize(negloglik,...))?Rena
It is a basic implementation of the MLE approach and would work for many distributions but not all. For example, there may be constraints on the support of the distribution that the code does not enforce.Cimabue

© 2022 - 2024 — McMap. All rights reserved.