How can I get Gini Coefficient in sklearn

About

Asked 13/7, 2017 at 11:42 Answered 21/7 at 18:33

I would like in sklearn package, Find the gini coefficients for each feature on a class of paths such as in iris data. like Iris-virginica Petal length gini：0.4 ，Petal width gini：0.4.

Levitical answered 13/7, 2017 at 11:42 Comment(3)

Can you post the data on which you want to find the gini ? – Ornithopter 14/7, 2017 at 6:46

from sklearn import datasets iris = datasets.load_iris() u can use this code download data – Levitical 14/7, 2017 at 10:41

Don't confuse Gini coefficient and Gini impurity. This article shows a very comprehensive python implementation of the latter. – Antonia 16/1, 2023 at 8:31

You can calculate the gini coefficient with Python+numpy like this:

from typing import List
from itertools import combinations

import numpy as np

def gini(x: List[float]) -> float:
    x = np.array(x, dtype=np.float32)
    n = len(x)
    diffs = sum(abs(i - j) for i, j in combinations(x, r=2))
    return diffs / (2 * n**2 * x.mean())

Incognizant answered 8/1, 2020 at 13:10 Comment(4)

This is one of the best Gini implementations in Python that I've seen :-D. I love it because there are a lot of alternative formulas out there, but if you look around this is the most agreed upon and consistent Gini formula you'll see in literature. The issue is that it's hard to implement this formula, and yet here it is in just 4 lines of code. Well done!! A+ – Kansas 30/9, 2020 at 20:54

I might have spoke too soon. I was comparing this to some other work (#39512760) and I wonder if you're over estimating n here. We want the mean absolute different, and your n is > the number of mean absolute differences that you calculate (from what I can see). – Kansas 30/9, 2020 at 21:36

Hower, the original equation contains permutations, not the combination used in the code! – Uncounted 18/1 at 8:47

As of the date of this comment, the calculation of this answer is wrong. I compared it with goodcalculators.com/gini-coefficient-calculator. (As mentioned by yeamusic21, I think changing the last line to return diffs / (n**2 * x.mean()) might correct the calculation, but this is just a brief, superficial assessment.) – Pru 29/9 at 9:37

Not sklearn, but this is based on the Lorenz curve definition and should do the trick:

import numpy as np

def gini(x):
    return np.sum(np.abs(np.subtract.outer(x, x)))/(2*len(x)**2*x.mean())

Hunkydory answered 21/7 at 18:33 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags