I want to load a huggingface pretrained transformer model directly to GPU (not enough CPU space) e.g. loading BERT
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased")
would be loaded to CPU until executing
model.to('cuda')
now the model is loaded into GPU
I want to load the model directly into GPU when executing from_pretrained
. Is this possible?
from_pretrained
. – Moslem