With BERT Text Classification, ValueError: too many dimensions 'str' error occuring
Asked Answered
T

6

16

Trying to make a classifier for sentiments of texts with BERT model but getting ValueError : too many dimensions 'str'

That is the DataFrame for values of train data; so they are train_labels

0   notr
1   notr
2   notr
3   negative
4   notr
... ...
854 positive
855 notr
856 notr
857 notr
858 positive

and there is the code which is producing the error for

train_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

At train_y = torch.tensor(train_labels.tolist()); getting error: ValueError: too many dimensions 'str'

can you help me please

enter image description here

enter image description here

Tailback answered 20/1, 2021 at 7:12 Comment(1)
LabelEncoder from scikit works, too. Look at this article: scikit-earn.org/stable/modules/generated/…Crud
H
13

REASON

The issue is you are passing a list of strings (str) in torch.tensor() , it only accepts the list of numerical values (integer, float etc.) .

SOLUTION

So I suggest you to convert your string labels into integer values before passing it to the torch.tensor().

IMPLEMENTATION

Following code might help you

# a temporary list to store the string labels
temp_list = train_labels.tolist()

# dictionary that maps integer to its string value 
label_dict = {}

# list to store integer labels 
int_labels = []

for i in range(len(temp_list)):
    label_dict[i] = temp_list[i]
    int_labels.append(i)

Now pass this int_labels to the torch.tensor and use it as label.

train_y = torch.tensor(int_labels)

and whenever you want to see the respective string label of any integer just use label_dict dictionary.

Homoio answered 20/1, 2021 at 9:35 Comment(0)
B
11

I had the same problem: This worksfor me I guess you need to do it at the beginning of your code after reading csv: df['labels'] = df['labels'].replace(['negative','notr','positive'],[0,1,2])

then split for training and testing from these labels.

Bola answered 30/3, 2021 at 1:18 Comment(1)
You can also write: df['labels'] = df['labels'].replace({'negative':0, 'notr':1, 'positive':2})Fingernail
A
3

Assuming you are using huggingface,

You would need to use 🤗 dataset

python
from datasets import ClassLabel

c2l = ClassLabel(num_classes=2, names=['spam', 'ham'])

labels = ["spam", "ham", "ham"]

[c2l.str2int(label) for label in labels ]
# [0, 1, 1]

For more reference: https://discuss.huggingface.co/t/converting-string-label-to-int/2816

Admire answered 14/7, 2021 at 16:55 Comment(0)
T
0

Thanks, it did converting to integer, but there is a problem about classification;

0
0   positive
1   negative
2   positive
3   notr
4   positive
... ...
4002    notr
4003    positive
4004    positive
4005    notr
4006    negative

Frame had that datas, after the convert to int,

0   0
1   1
2   2
3   3
4   4
... ...
4002    4002
4003    4003
4004    4004
4005    4005
4006    4006

it become like that, what I need is all positives , neutrals and negatives representing as 0 for neg-1 for neutral - 2 for pos

Tailback answered 21/1, 2021 at 8:58 Comment(0)
H
0

"replace the labels categories into the numerical values to avoid the "too many dimensions in str"

data['labels'] = data['labels'].replace(['inattention to results', 'fear of conflict', 'lack of commitment',
       'avoidance of accountability', 'absence of trust'],[0,1,2,3,4])
Horticulture answered 15/8, 2023 at 10:30 Comment(0)
S
0

You cannot convert a list of strings to Torch Tensors.

You need to convert you strings to integers or floats before you can :

# my_list has strings it it
my_list = ['0','1','2','3','4']

# Items are strings
type(my_list[0])                    
# > str

# Fail to convert to Torch Tensor 
# torch.tensor(my_list)               
# > ValueError: too many dimensions 'str'

# Convert each item to integer
my_list = [int(item) for item in my_list]

# Now, items are integers
type(my_list[0])                    
# > int

# Success
torch.tensor(my_list)                  
# > tensor([0, 1, 2, 3, 4])
Silviasilviculture answered 23/8, 2023 at 11:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.