Where does hugging face's transformers save models?

S

9

97

Running the below code downloads a model - does anyone know what folder it downloads it to?

!pip install -q transformers
from transformers import pipeline
model = pipeline('fill-mask')

Swop answered 14/5, 2020 at 13:27 Comment(1)

they save it in the sad face – Adal 29/2 at 20:8

W

130

Update 2023-05-02: The cache location has changed again, and is now ~/.cache/huggingface/hub/, as reported by @Victor Yan. Notably, the sub folders in the hub/ directory are also named similar to the cloned model path, instead of having a SHA hash, as in previous versions.

Update 2021-03-11: The cache location has now changed, and is located in ~/.cache/huggingface/transformers, as it is also detailed in the answer by @victorx.

This post should shed some light on it (plus some investigation of my own, since it is already a bit older).

As mentioned, the default location in a Linux system is ~/.cache/torch/transformers/ (I'm using transformers v 2.7, currently, but it is unlikely to change anytime soon.). The cryptic folder names in this directory seemingly correspond to the Amazon S3 hashes.

Also note that the pipeline tasks are just a "rerouting" to other models. To know which one you are currently loading, see here. For your specific model, pipeline(fill-mask) actually utilizes a distillroberta-base model.

Windcheater answered 14/5, 2020 at 14:46 Comment(7)

How would I get the vocab.txt from this location? It doesn't seem to be a directory:

-rw------- 1 root root 435778770 Jan 27 05:30 794538e7c825dc7be96d9fc3c73b79a9736da5f699fc50d31513dbca0740b349.f0d8b668347b3048f5b88e273fde3c3412366726bc99aa5935b7990944092fb1

– Wellwisher 2/2, 2021 at 19:33

This file is exactly the vocabulary in the form of a dictionary map (if you view it with something like less or nano, you can see it). – Windcheater 2/2, 2021 at 19:56

How do we save the model in a custom path? Say we want to dockerise the implementation - it would be nice to have everything in the same directory. Any idea how this can be done? – Rafaelita 15/3, 2021 at 16:2

I think there are several resources. Firstly, Huggingface indeed provides pre-built dockers here, where you could check how they do it. – Windcheater 15/3, 2021 at 18:36

@Rafaelita I found the parameter, you can pass in cache_dir, like: model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b", cache_dir="~/mycoolfolder"). I had to figure this out to use a fast external NVMe because I was running out of space. – Live 3/7, 2022 at 3:3

by using cache_dir=./disk2/myfolder, it still uses the default one to store the model and during the download my disk goes out of space. how should I do? – Gharry 2/8, 2023 at 11:26

Please open this comment in a separate question if you have any issues unrelated to the original question. – Windcheater 2/8, 2023 at 12:17

S

22

On windows 10, replace ~ with C:\Users\username or in cmd do cd /d "%HOMEDRIVE%%HOMEPATH%".

So full path will be: C:\Users\username\.cache\huggingface\transformers

Samoyed answered 16/4, 2021 at 10:35 Comment(0)

R

21

As of transformers 4.22, the path appears to be (tested on CentOS):

~/.cache/huggingface/hub/

Ramiroramjet answered 18/9, 2022 at 7:14 Comment(2)

As of 5th of October 2022, this is the path on Linux! [VERIFIED] – Paralogism 5/10, 2022 at 7:34

Also true on a Mac – Coxa 25/12, 2022 at 23:3

P

18

As of Transformers version 4.3, the cache location has been changed.

The exact place is defined in this code section https://github.com/huggingface/transformers/blob/master/src/transformers/file_utils.py#L181-L187

On Linux, it is at ~/.cache/huggingface/transformers.

The file names there are basically SHA hashes of the original URLs from which the files are downloaded. The corresponding json files can help you figure out what are the original file names.

Pointless answered 8/3, 2021 at 0:11 Comment(0)

I

5

Per this doc https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables

HF_HUB_CACHE To configure where repositories from the Hub will be cached locally (models, datasets and spaces).

Defaults to "$HF_HOME/hub" (e.g. "~/.cache/huggingface/hub" by default).

If you want to change the location of the cache, you could change the HF_HUB_CACHE or HF_HOME

import os
os.environ['HF_HOME'] = 'your path'

Intervalometer answered 19/12, 2023 at 3:33 Comment(0)

W

3

ls -la ~/.cache/huggingface/hub/

Wachtel answered 17/12, 2023 at 15:41 Comment(1)

Thank you for your interest in contributing to the Stack Overflow community. This question already has quite a few answers—including one that has been extensively validated by the community. Are you certain your approach hasn’t been given previously? If so, it would be useful to explain how your approach is different, under what circumstances your approach might be preferred, and/or why you think the previous answers aren’t sufficient. Can you kindly edit your answer to offer an explanation? – Fransis 18/12, 2023 at 0:20

U

2

from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="sentence-transformers/all-MiniLM-L6-v2", filename="config.json")

ls -lrth  ~/.cache/huggingface/hub/models--sentence-transformers--all-MiniLM-L6-v2/snapshots/7dbbc90392e2f80f3d3c277d6e90027e55de9125/
total 4.0K
lrwxrwxrwx 1 alex alex 52 Jan 25 12:15 config.json -> ../../blobs/72b987fd805cfa2b58c4c8c952b274a11bfd5a00
lrwxrwxrwx 1 alex alex 76 Jan 25 12:15 pytorch_model.bin -> ../../blobs/c3a85f238711653950f6a79ece63eb0ea93d76f6a6284be04019c53733baf256
lrwxrwxrwx 1 alex alex 52 Jan 25 12:30 vocab.txt -> ../../blobs/fb140275c155a9c7c5a3b3e0e77a9e839594a938
lrwxrwxrwx 1 alex alex 52 Jan 25 12:30 special_tokens_map.json -> ../../blobs/e7b0375001f109a6b8873d756ad4f7bbb15fbaa5
lrwxrwxrwx 1 alex alex 52 Jan 25 12:30 tokenizer_config.json -> ../../blobs/c79f2b6a0cea6f4b564fed1938984bace9d30ff0

Uranalysis answered 3/2, 2023 at 4:39 Comment(0)

E

2

On Windows C:\Users\USER\.cache\huggingface\hub

Extended answered 26/4, 2023 at 20:8 Comment(0)

L

1

I have some code on my Fedora Linux, like

embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
    model_kwargs={"device":'cuda'})

which will download the LLM when executed, to ~/.cache/torch/sentence_transformers when using sentence_transformers model(s)

You may want to list and sort by size

cd ~/.cache; du -sk -- * | sort -nr

Larose answered 18/9, 2023 at 7:12 Comment(0)

Recommended topics

Hot tags