I think you can ignore this message. I found it being reported on different websites this year, but if I get it correctly, this Github issue on the Huggingface transformers (https://github.com/huggingface/transformers/issues/22387) shows that the warning can be safely ignored. In addition, batching or using datasets
might not remove the warning or automatically use the resources in the best way. You can do call_count = 0
in here (https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/base.py#L1100) to ignore the warning, as explained by Martin Weyssow above.
How can I modify my code to batch my data and use parallel computing to make better use of my GPU resources:
You can add batching like this:
py_sentimiento = pipeline("sentiment-analysis", model="finiteautomata/beto-sentiment-analysis", tokenizer="finiteautomata/beto-sentiment-analysis", batch_size=8, device=device, truncation=True)
and most importantly, you can experiment with the batch size that will result to the highest GPU usage possible on your device and particular task.
Huggingface provides here some rules to help users figure out how to batch: https://huggingface.co/docs/transformers/main_classes/pipelines#pipeline-batching. Making the best resource/GPU usage possible might take some experimentation and it depends on the use case you work on every time.
What does this warning mean, and why should I use a dataset for efficiency?
This means the GPU utilization is not optimal, because the data is not grouped together and it is thus not processed efficiently. Using a dataset from the Huggingface library datasets
will utilize your resources more efficiently.
However, it is not so easy to tell what exactly is going on, especially considering that we don’t know exactly how the data looks like, what the device is and how the model deals with the data internally. The warning might go away by using the datasets
library, but that does not necessarily mean that the resources are optimally used.
What code or function or library should be used with hugging face transformers?
Here is a code example with pipelines
and the datasets
library: https://huggingface.co/docs/transformers/v4.27.1/pipeline_tutorial#using-pipelines-on-a-dataset. It mentions that using iterables will fill your GPU as fast as possible and batching might also help with computational time improvements.
In your case it seems you are doing a relatively small POC (doing inference for under 10,000 documents with a medium size model), so I don’t think you need to use pipelines. I assume the sentiment analysis model is a classifier and you want to keep using Pandas
as shown in the post, so here is how you can combine both. This is usually fast enough for my experiments and prints no warnings about the resources.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch as t
import pandas as pd
model = AutoModelForSequenceClassification.from_pretrained("finiteautomata/beto-sentiment-analysis")
tokenizer = AutoTokenizer.from_pretrained("finiteautomata/beto-sentiment-analysis")
def classify_dataframe_row(
example: pd.Series,
):
output = model(**tokenizer(example["text"], return_tensors="pt"))
prediction = t.argmax(output[0]).detach().numpy()
return prediction
dataset = pd.read_csv("file")
dataset = dataset.assign(
prediction=dataset.progress_apply(classify_dataframe_row, axis=1)
)
As soon as your inference starts, either with this snippet or with the datasets
library code, you can run nvidia-smi
in a terminal and check what the GPU usage is and play around with the parameters to optimize it. Beware that running the code on your local machine with a GPU vs running it on a larger machine, e.g., a Linux server with perhaps a more powerful GPU might lead to different performance and might need different tuning. If you wish to run the code for larger document collections, you can split the data in order to avoid GPU memory errors locally, or in order to speed up the inference with concurrent runs in a server.
batch_size
in yourpipeline
and see if it speeds things up on the GPU? – Feer