Cast features to ClassLabel
Asked Answered
V

1

7

I have a dataset with type dictionary which I converted to Dataset:

ds = datasets.Dataset.from_dict(bio_dict)

The shape now is:

Dataset({
    features: ['id', 'text', 'ner_tags', 'input_ids', 'attention_mask', 'label'],
    num_rows: 8805
})

When I use the train_test_split function of Datasets I receive the following error:

train_testvalid = ds.train_test_split(test_size=0.5, shuffle=True, stratify_by_column="label")

ValueError: Stratifying by column is only supported for ClassLabel column, and column label is Sequence.

How can I change the type to ClassLabel so that stratify works?

Valeda answered 22/12, 2022 at 7:19 Comment(0)
B
7

You should apply the following class_encode_column function:

ds = ds.class_encode_column("label")
Blackmail answered 13/1, 2023 at 8:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.