Use LLama 2 7B with python
Asked Answered
H

3

11

I would like to use llama 2 7B locally on my win 11 machine with python. I have a conda venv installed with cuda and pytorch with cuda support and python 3.10. So I am ready to go.

The files a here locally downloaded from meta: folder llama-2-7b-chat with:

  • checklist.chk
  • consolidated.00.pth
  • params.json

Now I would like to interact with the model. But I only find code snippets downloading the model from huggingface, which is not needed in my case.

Can someone provide me with a few lines of code to interact with the model via Python?

Hammerskjold answered 5/8, 2023 at 13:51 Comment(2)
I found some additional info at this repository: github.com/facebookresearch/llama I added the "tokenizer.model" and installed the additional dependencies. But I get several errors regarding NCCL, Kubernetes etc., so I guess that is not meant for my use caseHammerskjold
Read the readme of that repo again, you shall find llama-recipes (under the title, 3rd paragraph) which is the code example.Carolinacaroline
V
4

I know you mentioned huggingface is unnecessary in your case but to download and use the model, it's much easier to use their transformers.

After you download the weights - you need to re-structure the folder as follows:(notice I moved 3 of the files under 7B)

├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── config.json
├── generation_config.json
├── LICENSE
├── tokenizer_checklist.chk
├── tokenizer.model
└── USE_POLICY.md

Next download the conversion script from here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py

And finally run this script:

python convert_llama_weights_to_hf.py --input_dir llama-2-7b/ --model_size 7B --output_dir model

Once it's finished - you can import the model as follows:

from transformers import LlamaForCausalLM, LlamaTokenizer
tokenizer = LlamaTokenizer.from_pretrained("./model")
model = LlamaForCausalLM.from_pretrained("./model")

You can then learn more on how to prompt the model here: https://huggingface.co/docs/transformers/v4.31.0/en/model_doc/llama2#transformers.LlamaForCausalLM.forward.example

Valvulitis answered 24/8, 2023 at 8:31 Comment(0)
H
0

The downloaded files are not all needed. I got it to work using cuda gpu on win 11, but with a slightly other way:

  1. First of all, I used this repo and not the code provided by Meta itsself (but I had to download the files via huggingface): https://github.com/oobabooga/text-generation-webui

  2. The cuda installation via conda did have some errors, even when everything looked fine at first. I could solve this by installing the stack like it was provided here: https://github.com/jeffheaton/t81_558_deep_learning/blob/master/install/manual_setup2.ipynb

I hope that helps others as well ...

Hammerskjold answered 5/8, 2023 at 20:21 Comment(0)
M
0

here is the official answer: https://huggingface.co/docs/transformers/main/model_doc/llama2

After filling out the form and gaining access to the model checkpoints, you should be able to use the already converted checkpoints. Otherwise, if you are converting your own model, feel free to use the conversion script. The script can be called with the following (example) command:

python src/transformers/models/llama/convert_llama_weights_to_hf.py \
    --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path
Mirador answered 29/7 at 8:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.