How can I get Gini Coefficient in sklearn
Asked Answered
L

2

6

I would like in sklearn package, Find the gini coefficients for each feature on a class of paths such as in iris data. like Iris-virginica Petal length gini:0.4 ,Petal width gini:0.4.

Levitical answered 13/7, 2017 at 11:42 Comment(3)
Can you post the data on which you want to find the gini ?Ornithopter
from sklearn import datasets iris = datasets.load_iris() u can use this code download dataLevitical
Don't confuse Gini coefficient and Gini impurity. This article shows a very comprehensive python implementation of the latter.Antonia
I
5

You can calculate the gini coefficient with Python+numpy like this:

from typing import List
from itertools import combinations

import numpy as np

def gini(x: List[float]) -> float:
    x = np.array(x, dtype=np.float32)
    n = len(x)
    diffs = sum(abs(i - j) for i, j in combinations(x, r=2))
    return diffs / (2 * n**2 * x.mean())
Incognizant answered 8/1, 2020 at 13:10 Comment(4)
This is one of the best Gini implementations in Python that I've seen :-D. I love it because there are a lot of alternative formulas out there, but if you look around this is the most agreed upon and consistent Gini formula you'll see in literature. The issue is that it's hard to implement this formula, and yet here it is in just 4 lines of code. Well done!! A+Kansas
I might have spoke too soon. I was comparing this to some other work (#39512760) and I wonder if you're over estimating n here. We want the mean absolute different, and your n is > the number of mean absolute differences that you calculate (from what I can see).Kansas
Hower, the original equation contains permutations, not the combination used in the code!Uncounted
As of the date of this comment, the calculation of this answer is wrong. I compared it with goodcalculators.com/gini-coefficient-calculator. (As mentioned by yeamusic21, I think changing the last line to return diffs / (n**2 * x.mean()) might correct the calculation, but this is just a brief, superficial assessment.)Pru
H
0

Not sklearn, but this is based on the Lorenz curve definition and should do the trick:

import numpy as np

def gini(x):
    return np.sum(np.abs(np.subtract.outer(x, x)))/(2*len(x)**2*x.mean())
Hunkydory answered 21/7 at 18:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.