I'm trying to load quantization like
from transformers import LlamaForCausalLM
from transformers import BitsAndBytesConfig
model = '/model/'
model = LlamaForCausalLM.from_pretrained(model, quantization_config=BitsAndBytesConfig(load_in_8bit=True))
but I get the error
ImportError: Using `load_in_8bit=True` requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes `pip install -i https://test.pypi.org/simple/ bitsandbytes` or pip install bitsandbytes`
But I've installed both, and I get the same error. I shut down and restarted the jupyter kernel I was using this on.
transformers==4.30
(from the default version 4.33) in the python 3.10 env seems to fix it for now. – CoronelNO GPU found. A GPU is needed for quantization.
– Rubinrubina