Multi Label Annotator agreement with Cohen Kappa
Asked Answered
L

3

8

Say that I want to have annotations for documents. Every document can be annotated with multiple labels. In this example, I have 2 annotators (a and b), and they each label two documents.

from sklearn.metrics import cohen_kappa_score
annotator_a = [ 
    ["a","b","c"],
    ["d","e"]
]
annotator_b = [
    ["b","c"],
    ["f"]
]

Annotator_a labels document 1 with labels a, b and c. Annotator_b labels documents 1 with labels b and c.

I tried to calculate annotator agreement using:

cohen_kappa_score(annotator_a, annotator_b)

But this results in an error:

ValueError: You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead.

Any ideas on how I can calculate annotator agreement on this set?

Louls answered 11/9, 2018 at 9:32 Comment(0)
L
10

Cohen's Kappa does not support multi-label input. Instead of using Cohen's Kappa, one could use Krippendorff's Alpha. This measure supports inter-rater agreement, missing values and non-exclusive topics. It is available on pypi.

Louls answered 19/9, 2018 at 11:15 Comment(0)
S
5

The specific error is due to an unsupported representation of multi-label output (see documentation of type_of_target function of sklearn). Even the correct multi-label output you would still get an error since cohen_kappa_score does not support multi-label input (see below). In fact, Cohen's kappa can be applied for multi-class problems only for exclusive classes, and multi label output is by definition non exclusive.

What you could do is have a binary classifier for each label and compute Cohen's kappa for each label. If you need a unique number representing agreement you could compute the average kappa over the labels.

Example: Cohen's kappa for multi label

to_dict = lambda x: {k: [1 if k in y else 0 for y in x] for k in labels}
a_dict = to_dict(annotator_a)
b_dict = to_dict(annotator_b)
cohen_dict = {k: cohen_kappa_score(a_dict[k], b_dict[k]) for k in labels}
cohen_avg = np.mean(list(cohen_dict.values()))

print(f'a_dict: {a_dict}')
print(f'b_dict: {b_dict}')
print(f'cohen_dict: {cohen_dict}')
print(f'cohen_avg: {cohen_avg}')

output:

a_dict: {'a': [1, 0], 'b': [1, 0], 'c': [1, 0], 'd': [0, 1], 'e': [0, 1], 'f': [0, 0]}
b_dict: {'a': [0, 0], 'b': [1, 0], 'c': [1, 0], 'd': [0, 0], 'e': [0, 0], 'f': [0, 1]}
cohen_dict: {'a': 0.0, 'b': 1.0, 'c': 1.0, 'd': 0.0, 'e': 0.0, 'f': 0.0}
cohen_avg: 0.3333333333333333

how to transform to sequence of sequence to correct multi label representation

from sklearn.preprocessing import MultiLabelBinarizer
m = MultiLabelBinarizer(classes=list('abcdef'))
a_multi = m.fit_transform(annotator_a)
b_multi = m.fit_transform(annotator_b)
print(f'a_multi:\n{a_multi}')
print(f'b_multi:\n{b_multi}')
cohen_kappa_score(a_multi, b_multi)

output:

a_multi:
[[1 1 1 0 0 0]
 [0 0 0 1 1 0]]
b_multi:
[[0 1 1 0 0 0]
 [0 0 0 0 0 1]]
...
ValueError: multilabel-indicator is not supported
Sexuality answered 12/9, 2018 at 12:38 Comment(0)
L
1

Although original Cohen's Kappa statistic does not support multiple labels, there are proposed extensions to address this case. By assigning weights to each label, Kappa values allows one to analyze the contribution of primary and secondary (and potentially more) categories to agreement scores. For details, refer to the Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points paper.

Of course, one could also use Krippendorff's alpha reliability coefficient, which applies to any number of annotators and categories. The weighted Kappa mentioned above is still limited to pairs of labelers.

Lucknow answered 15/7, 2021 at 19:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.