I would like to remove tensorflow and hugging face models from my laptop. I did find one link https://github.com/huggingface/transformers/issues/861 but is there not command that can remove them because as mentioned in the link manually deleting can cause problems because we don't know which other files are linked to those models or are expecting some model to be present in that location or simply it may cause some error.
Use
pip install huggingface_hub["cli"]
Then
huggingface-cli delete-cache
You should now see a list of revisions that you can select/deselect.
See this link for details.
The transformers library will store the downloaded files in your cache. As far as I know, there is no built-in method to remove certain models from the cache. But you can code something by yourself. The files are stored with a cryptical name alongside two additional files that have .json
(.h5.json
in case of Tensorflow models) and .lock
appended to the cryptical name. The json file contains some metadata that can be used to identify the file. The following is an example of such a file:
{"url": "https://cdn.huggingface.co/roberta-base-pytorch_model.bin", "etag": "\"8a60a65d5096de71f572516af7f5a0c4-30\""}
We can now use this information to create a list of your cached files as shown below:
import glob
import json
import re
from collections import OrderedDict
from transformers import TRANSFORMERS_CACHE
metaFiles = glob.glob(TRANSFORMERS_CACHE + '/*.json')
modelRegex = "huggingface\.co\/(.*)(pytorch_model\.bin$|resolve\/main\/tf_model\.h5$)"
cachedModels = {}
cachedTokenizers = {}
for file in metaFiles:
with open(file) as j:
data = json.load(j)
isM = re.search(modelRegex, data['url'])
if isM:
cachedModels[isM.group(1)[:-1]] = file
else:
cachedTokenizers[data['url'].partition('huggingface.co/')[2]] = file
cachedTokenizers = OrderedDict(sorted(cachedTokenizers.items(), key=lambda k: k[0]))
Now all you have to do is to check the keys of cachedModels
and cachedTokenizers
and decide if you want to keep them or not. In case you want to delete them, just check for the value of the dictionary and delete the file from the cache. Don't forget to also delete the corresponding *.json
and *.lock
files.
From a comment in transformers github issue, you can use the following way to find the cache directory so that you can clean it:
from transformers import file_utils
print(file_utils.default_cache_path)
You can run this code to delete all models
from transformers import TRANSFORMERS_CACHE
print(TRANSFORMERS_CACHE)
import shutil
shutil.rmtree(TRANSFORMERS_CACHE)
pip uninstall tensorflow
pip uninstall tensorflow-gpu
pip uninstall transformers
and find where you have saved gpt-2
model.save_pretrained("./english-gpt2") .???
english-gpt2 = your downloaded model name.
from that path you can manually delete.
transformers
library. That's why I said that you are not answering the question of the OP. I also just tested what you have said and calling save_pretrained
does not clear the cache (which is correct in my opinion). –
Ita from transformers import TRANSFORMERS_CACHE
and print(TRANSFORMERS_CACHE)
. Check the printed directory. It will be full of files when you have loaded models with from_pretrained
before. This is the link to the official documentation confirming this beheaviour. –
Ita © 2022 - 2024 — McMap. All rights reserved.