ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate`
Asked Answered
S

4

5

I'm trying to fine-tune llama2-13b-chat-hf with an open source datasets.

I always used this template but now I'm getting this error:

ImportError: Using bitsandbytes 8-bit quantization requires Accelerate: pip install accelerate and the latest version of bitsandbytes: pip install -i https://pypi.org/simple/ bitsandbytes

I installed all the packages required and these are the versions:

    accelerate @ git+https://github.com/huggingface/accelerate.git@97d2168e5953fe7373a06c69c02c5a00a84d5344
    bitsandbytes==0.42.0
    datasets==2.17.1
    huggingface-hub==0.20.3
    peft==0.8.2
    tokenizers==0.13.3
    torch==2.1.0+cu118
    torchaudio==2.1.0+cu118
    torchvision==0.16.0+cu118
    transformers==4.30.0
    trl==0.7.11

Anyone know if that is a version issues? How did you fix that?

I tried to install other version but nothing.

Sulfaguanidine answered 22/2 at 12:37 Comment(1)
I encountered the same issue. After some digging I found this was due to CUDA error. I deleted torch and reinstalled using the script from the official website "pytorch.org" It is working now: These are the versions of the packages that is working for me and different than yours: tokenizers==0.15.2 torch==2.2.1+cu118 torchaudio==2.2.1+cu118 torchvision==0.17.1+cu118 transformers==4.38.1Expressway
S
4

Have you tried accelerate test in your cmd terminal? If your installation is successful, this command should output a list of messages and a "test successful" in the end. If this command fails, it means that there is something wrong with your pytorch + accelerate environment. You should reinstall them following the official tutorials. If the command succeeds and you still can't do multi-GPU finetuning, you should report this issue in bitsandbytes' github repo.

Here are some other potential causes.

  • Your cuda version is too old. Most tools are built on 12.0 + nowadays. Yous should update cuda with this link

  • python version should be 3.10 +, otherwise you won't be able to install the latest tools with pip

  • Why do you want to train a quantized model? Quantization is made to shrink the model for deployment instead of training. This tool is not designed for your purpose. If you finetune your model with quantized parameters, then gradients won't have any impact, because they are simply too small to represent with only 8 bits. If you want to finetune a LLM with limited GPU memory, you should try lora or SFT. Both of them can freeze some layers to reduce VRAM usage.

Soviet answered 28/2 at 9:20 Comment(0)
J
1
  1. Go to https://pytorch.org/
  2. Select your config
  3. on your env, run the given code

For example, I choosed a stable / windows / python /CUDA 11.8, the website gave me this:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

(thanks to Niraj Pahari, I just wanted to make it a official answer and not only a commentary)

Jarrodjarrow answered 3/3 at 22:2 Comment(0)
B
1

If you're running the code on Google Colab, make sure to change the runtime environment and select the Hardware Accelerator to be GPU.

Banneret answered 2/5 at 9:34 Comment(0)
O
0

Yes, it has to do with CUDA, make sure you have the latest one installed and check it with nvcc -V

Outgroup answered 20/4 at 13:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.