Trying to install guanaco (pip install guanaco) for a text classification model but getting error
Asked Answered
C

1

1

I'm trying to install the guanaco language model https://arxiv.org/abs/2305.14314 using pip install guanaco for a text classification model but getting error.

Failed to build guanaco
ERROR: Could not build wheels for guanaco, which is required to install pyproject.toml-based projects

How do I install the language model and use it for classification?

Chazan answered 31/5, 2023 at 9:26 Comment(1)
I assume the error is huge as with a lot of wheel building errors, I spent so much time helping with this on chats, so: If you're on windows, look for info about build tools something something in the error - there's a link what to install. If you're on *nix, you gotta install the python system package with c headers and stuff (e.g. python3.10-dev - the exact name depends on your package manager and your python version)Sika
A
3

The PyPI library that you've installed via pip install guanaco is not a large language model that is supported by Huggingface tool, it's this: https://pypi.org/project/guanaco/

To use the Guanaco model, see https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing

import torch
from peft import PeftModel    
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer

model_name = "decapoda-research/llama-7b-hf"
adapters_name = 'timdettmers/guanaco-7b'

print(f"Starting to load the model {model_name} into memory")

m = AutoModelForCausalLM.from_pretrained(
    model_name,
    #load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    device_map={"": 0}
)
m = PeftModel.from_pretrained(m, adapters_name)
m = m.merge_and_unload()
tok = LlamaTokenizer.from_pretrained(model_name)
tok.bos_token_id = 1

stop_token_ids = [0]

print(f"Successfully loaded the model {model_name} into memory")

Then to use the model:

prompt = "Today was an amazing day because"
inputs = tok(prompt, return_tensors="pt")

outputs = m.generate(**inputs, do_sample=True, num_beams=1, max_new_tokens=100)
tok.batch_decode(outputs, skip_special_tokens=True)

[out]:

['Today was an amazing day because I met M, my bestie from Bermuda in 2002.\nWe have not seen each other for 8 years and I was thrilled to meet her and her husband. We went out for lunch and then went for a walk in the park. We caught each other up on our lives and just laughed and laughed. I love her so much and I am so glad we are back in touch. It was like no time had passed at all. I am so']

To use it for zero-shot classification:

from transformers import pipeline

tok.add_special_tokens({'pad_token': '[PAD]'})

classifier = pipeline("zero-shot-classification", model=m, tokenizer=tok)

classifier("Today was an amazing day", candidate_labels=["negative", "positive"])

[out]:

{'sequence': 'Today was an amazing day',
 'labels': ['positive', 'negative'],
 'scores': [0.7662936449050903, 0.23370634019374847]}
Allembracing answered 31/5, 2023 at 10:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.