stucking at downloading shards for loading LLM model from huggingface
Asked Answered
S

1

15

I am just using huggingface example to use their LLM model, but it stuck at the:

downloading shards:   0%|          | 0/5 [00:00<?, ?it/s]

(I am using Jupiter notebook, python 3.11, and all requirements were installed)

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tiiuae/falcon-40b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
   "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron:",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

how can I fix it?

Sepal answered 17/7, 2023 at 19:57 Comment(0)
E
19

I think it's not stuck. These are just very large models that take a while to download. tqdm only estimates after the first iteration, so it just looks like nothing is happening. I'm currently downloading the smallest version of LLama2 (7B parameters) and it's downloading two shards. The first took over 17 minutes to complete and I have reasonably fast internet connection.

Evonneevonymus answered 16/8, 2023 at 6:9 Comment(1)
Is there a way recommended way to cache this so it doesn't take so long every time that cell needs to run.Fishback

© 2022 - 2024 — McMap. All rights reserved.