I have access to six 24GB GPUs. When I try to load some HuggingFace models, for example the following
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("google/ul2")
model = AutoModelForSeq2SeqLM.from_pretrained("google/ul2")
I get an out of memory error, as the model only seems to be able to load on a single GPU. However, while the whole model cannot fit into a single 24GB GPU card, I have 6 of these and would like to know if there is a way to distribute the model loading across multiple cards, to perform inference.
HuggingFace seems to have a webpage where they explain how to do this but it has no useful content as of today.
TypeError: <MyTransformerModel>.__init__() got an unexpected keyword argument 'device'
, for information I'm ontransformers==4.26.0
– Diazo