How to load a huggingface pretrained transformer model directly to GPU?

About

Asked 5/10, 2023 at 13:57 Answered 5/10, 2023 at 14:23

Solved python nlp huggingface-transformers

I want to load a huggingface pretrained transformer model directly to GPU (not enough CPU space) e.g. loading BERT

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased")

would be loaded to CPU until executing

model.to('cuda')

now the model is loaded into GPU

I want to load the model directly into GPU when executing from_pretrained. Is this possible?

Purposive answered 5/10, 2023 at 13:57 Comment(2)

In the documentation you can see that there is no parameter that allows you to load a model on GPU using from_pretrained. – Moslem 5/10, 2023 at 14:4

thanks! I thought there might be a way to indirectly do this by setting some parameter before loading the model or something like this – Purposive 5/10, 2023 at 14:5

I'm answering my own question

huggingface accelerate could be helpful in moving the model to GPU before it's fully loaded in CPU, so it worked when
GPU memory > model size > CPU memory
by using device_map = 'cuda'

!pip install accelerate

then use

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased", device_map = 'cuda')

Purposive answered 5/10, 2023 at 14:23 Comment(3)

❤️❤️❤️❤️❤️❤️❤️❤️ – Cenozoic 29/6 at 13:29

For multiple gpus use device_map='auto' this will help share the workload across your gpus – Cenozoic 30/6 at 7:6

@Cenozoic Note that device_map='auto' may cause host OOM on small RAM. – Cymbiform 6/9 at 4:11

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags