Running the below code downloads a model - does anyone know what folder it downloads it to?
!pip install -q transformers
from transformers import pipeline
model = pipeline('fill-mask')
Running the below code downloads a model - does anyone know what folder it downloads it to?
!pip install -q transformers
from transformers import pipeline
model = pipeline('fill-mask')
Update 2023-05-02: The cache location has changed again, and is now ~/.cache/huggingface/hub/
, as reported by @Victor Yan. Notably, the sub folders in the hub/
directory are also named similar to the cloned model path, instead of having a SHA hash, as in previous versions.
Update 2021-03-11: The cache location has now changed, and is located in ~/.cache/huggingface/transformers
, as it is also detailed in the answer by @victorx.
This post should shed some light on it (plus some investigation of my own, since it is already a bit older).
As mentioned, the default location in a Linux system is ~/.cache/torch/transformers/
(I'm using transformers v 2.7, currently, but it is unlikely to change anytime soon.). The cryptic folder names in this directory seemingly correspond to the Amazon S3 hashes.
Also note that the pipeline tasks are just a "rerouting" to other models. To know which one you are currently loading, see here. For your specific model, pipeline(fill-mask)
actually utilizes a distillroberta-base
model.
vocab.txt
from this location? It doesn't seem to be a directory: -rw------- 1 root root 435778770 Jan 27 05:30 794538e7c825dc7be96d9fc3c73b79a9736da5f699fc50d31513dbca0740b349.f0d8b668347b3048f5b88e273fde3c3412366726bc99aa5935b7990944092fb1
–
Wellwisher less
or nano
, you can see it). –
Windcheater cache_dir
, like: model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b", cache_dir="~/mycoolfolder")
. I had to figure this out to use a fast external NVMe because I was running out of space. –
Live On windows 10, replace ~
with C:\Users\username
or in cmd do cd /d "%HOMEDRIVE%%HOMEPATH%"
.
So full path will be: C:\Users\username\.cache\huggingface\transformers
As of transformers 4.22, the path appears to be (tested on CentOS):
~/.cache/huggingface/hub/
As of Transformers version 4.3, the cache location has been changed.
The exact place is defined in this code section https://github.com/huggingface/transformers/blob/master/src/transformers/file_utils.py#L181-L187
On Linux, it is at ~/.cache/huggingface/transformers.
The file names there are basically SHA hashes of the original URLs from which the files are downloaded. The corresponding json files can help you figure out what are the original file names.
Per this doc https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables
HF_HUB_CACHE To configure where repositories from the Hub will be cached locally (models, datasets and spaces).
Defaults to "$HF_HOME/hub" (e.g. "~/.cache/huggingface/hub" by default).
If you want to change the location of the cache, you could change the HF_HUB_CACHE or HF_HOME
import os
os.environ['HF_HOME'] = 'your path'
ls -la ~/.cache/huggingface/hub/
from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="sentence-transformers/all-MiniLM-L6-v2", filename="config.json")
ls -lrth ~/.cache/huggingface/hub/models--sentence-transformers--all-MiniLM-L6-v2/snapshots/7dbbc90392e2f80f3d3c277d6e90027e55de9125/
total 4.0K
lrwxrwxrwx 1 alex alex 52 Jan 25 12:15 config.json -> ../../blobs/72b987fd805cfa2b58c4c8c952b274a11bfd5a00
lrwxrwxrwx 1 alex alex 76 Jan 25 12:15 pytorch_model.bin -> ../../blobs/c3a85f238711653950f6a79ece63eb0ea93d76f6a6284be04019c53733baf256
lrwxrwxrwx 1 alex alex 52 Jan 25 12:30 vocab.txt -> ../../blobs/fb140275c155a9c7c5a3b3e0e77a9e839594a938
lrwxrwxrwx 1 alex alex 52 Jan 25 12:30 special_tokens_map.json -> ../../blobs/e7b0375001f109a6b8873d756ad4f7bbb15fbaa5
lrwxrwxrwx 1 alex alex 52 Jan 25 12:30 tokenizer_config.json -> ../../blobs/c79f2b6a0cea6f4b564fed1938984bace9d30ff0
I have some code on my Fedora Linux, like
embedding_model = HuggingFaceEmbeddings(
model_name="sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
model_kwargs={"device":'cuda'})
which will download the LLM when executed, to ~/.cache/torch/sentence_transformers
when using sentence_transformers
model(s)
You may want to list and sort by size
cd ~/.cache; du -sk -- * | sort -nr
© 2022 - 2024 — McMap. All rights reserved.