The specific error is due to an unsupported representation of multi-label output (see documentation of type_of_target
function of sklearn).
Even the correct multi-label output you would still get an error since cohen_kappa_score
does not support multi-label input (see below). In fact, Cohen's kappa can be applied for multi-class problems only for exclusive classes, and multi label output is by definition non exclusive.
What you could do is have a binary classifier for each label and compute Cohen's kappa for each label. If you need a unique number representing agreement you could compute the average kappa over the labels.
Example: Cohen's kappa for multi label
to_dict = lambda x: {k: [1 if k in y else 0 for y in x] for k in labels}
a_dict = to_dict(annotator_a)
b_dict = to_dict(annotator_b)
cohen_dict = {k: cohen_kappa_score(a_dict[k], b_dict[k]) for k in labels}
cohen_avg = np.mean(list(cohen_dict.values()))
print(f'a_dict: {a_dict}')
print(f'b_dict: {b_dict}')
print(f'cohen_dict: {cohen_dict}')
print(f'cohen_avg: {cohen_avg}')
output:
a_dict: {'a': [1, 0], 'b': [1, 0], 'c': [1, 0], 'd': [0, 1], 'e': [0, 1], 'f': [0, 0]}
b_dict: {'a': [0, 0], 'b': [1, 0], 'c': [1, 0], 'd': [0, 0], 'e': [0, 0], 'f': [0, 1]}
cohen_dict: {'a': 0.0, 'b': 1.0, 'c': 1.0, 'd': 0.0, 'e': 0.0, 'f': 0.0}
cohen_avg: 0.3333333333333333
how to transform to sequence of sequence to correct multi label representation
from sklearn.preprocessing import MultiLabelBinarizer
m = MultiLabelBinarizer(classes=list('abcdef'))
a_multi = m.fit_transform(annotator_a)
b_multi = m.fit_transform(annotator_b)
print(f'a_multi:\n{a_multi}')
print(f'b_multi:\n{b_multi}')
cohen_kappa_score(a_multi, b_multi)
output:
a_multi:
[[1 1 1 0 0 0]
[0 0 0 1 1 0]]
b_multi:
[[0 1 1 0 0 0]
[0 0 0 0 0 1]]
...
ValueError: multilabel-indicator is not supported