I have a dataset with type dictionary which I converted to Dataset
:
ds = datasets.Dataset.from_dict(bio_dict)
The shape now is:
Dataset({
features: ['id', 'text', 'ner_tags', 'input_ids', 'attention_mask', 'label'],
num_rows: 8805
})
When I use the train_test_split
function of Datasets
I receive the following error:
train_testvalid = ds.train_test_split(test_size=0.5, shuffle=True, stratify_by_column="label")
ValueError: Stratifying by column is only supported for ClassLabel column, and column label is Sequence.
How can I change the type to ClassLabel so that stratify works?