I would like in sklearn package, Find the gini coefficients for each feature on a class of paths such as in iris data. like Iris-virginica Petal length gini:0.4 ,Petal width gini:0.4.
How can I get Gini Coefficient in sklearn
Asked Answered
You can calculate the gini coefficient with Python+numpy like this:
from typing import List
from itertools import combinations
import numpy as np
def gini(x: List[float]) -> float:
x = np.array(x, dtype=np.float32)
n = len(x)
diffs = sum(abs(i - j) for i, j in combinations(x, r=2))
return diffs / (2 * n**2 * x.mean())
This is one of the best Gini implementations in Python that I've seen :-D. I love it because there are a lot of alternative formulas out there, but if you look around this is the most agreed upon and consistent Gini formula you'll see in literature. The issue is that it's hard to implement this formula, and yet here it is in just 4 lines of code. Well done!! A+ –
Kansas
I might have spoke too soon. I was comparing this to some other work (#39512760) and I wonder if you're over estimating n here. We want the mean absolute different, and your n is > the number of mean absolute differences that you calculate (from what I can see). –
Kansas
Hower, the original equation contains permutations, not the combination used in the code! –
Uncounted
As of the date of this comment, the calculation of this answer is wrong. I compared it with goodcalculators.com/gini-coefficient-calculator. (As mentioned by yeamusic21, I think changing the last line to
return diffs / (n**2 * x.mean())
might correct the calculation, but this is just a brief, superficial assessment.) –
Pru Not sklearn, but this is based on the Lorenz curve definition and should do the trick:
import numpy as np
def gini(x):
return np.sum(np.abs(np.subtract.outer(x, x)))/(2*len(x)**2*x.mean())
© 2022 - 2024 — McMap. All rights reserved.
from sklearn import datasets iris = datasets.load_iris()
u can use this code download data – Levitical