How to load a huggingface pretrained transformer model directly to GPU?
Asked Answered
P

1

12

I want to load a huggingface pretrained transformer model directly to GPU (not enough CPU space) e.g. loading BERT

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased")

would be loaded to CPU until executing

model.to('cuda')

now the model is loaded into GPU

I want to load the model directly into GPU when executing from_pretrained. Is this possible?

Purposive answered 5/10, 2023 at 13:57 Comment(2)
In the documentation you can see that there is no parameter that allows you to load a model on GPU using from_pretrained.Moslem
thanks! I thought there might be a way to indirectly do this by setting some parameter before loading the model or something like thisPurposive
P
22

I'm answering my own question

huggingface accelerate could be helpful in moving the model to GPU before it's fully loaded in CPU, so it worked when
GPU memory > model size > CPU memory
by using device_map = 'cuda'

!pip install accelerate

then use

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased", device_map = 'cuda')
Purposive answered 5/10, 2023 at 14:23 Comment(3)
❤️❤️❤️❤️❤️❤️❤️❤️Cenozoic
For multiple gpus use device_map='auto' this will help share the workload across your gpusCenozoic
@Cenozoic Note that device_map='auto' may cause host OOM on small RAM.Cymbiform

© 2022 - 2024 — McMap. All rights reserved.