Remove downloaded tensorflow and pytorch(Hugging face) models

Asked 27/11, 2020 at 12:27 Answered 14/2 at 15:19

tensorflow pytorch huggingface-transformers

I would like to remove tensorflow and hugging face models from my laptop. I did find one link https://github.com/huggingface/transformers/issues/861 but is there not command that can remove them because as mentioned in the link manually deleting can cause problems because we don't know which other files are linked to those models or are expecting some model to be present in that location or simply it may cause some error.

Turin answered 27/11, 2020 at 12:27 Comment(2)

Do you want to remove certain models or the whole cache (i.e. all models)? – Ita 27/11, 2020 at 13:50

certain models, want to remove models which are no longer useful and free up certain space on hardisk – Turin 27/11, 2020 at 14:2

Use

pip install huggingface_hub["cli"]

Then

huggingface-cli delete-cache

You should now see a list of revisions that you can select/deselect.

See this link for details.

Xerography answered 17/2, 2023 at 22:18 Comment(2)

Perfect! This solution is elegant and clean – Bireme 2/6, 2023 at 19:58

Usage instructions: "Press <space> to select, <enter> to validate and <ctrl+c> to quit without modification." – Nasturtium 9/3 at 14:45

The transformers library will store the downloaded files in your cache. As far as I know, there is no built-in method to remove certain models from the cache. But you can code something by yourself. The files are stored with a cryptical name alongside two additional files that have .json (.h5.json in case of Tensorflow models) and .lock appended to the cryptical name. The json file contains some metadata that can be used to identify the file. The following is an example of such a file:

{"url": "https://cdn.huggingface.co/roberta-base-pytorch_model.bin", "etag": "\"8a60a65d5096de71f572516af7f5a0c4-30\""}

We can now use this information to create a list of your cached files as shown below:

import glob
import json
import re
from collections import OrderedDict 
from transformers import TRANSFORMERS_CACHE
 
metaFiles = glob.glob(TRANSFORMERS_CACHE + '/*.json')
modelRegex = "huggingface\.co\/(.*)(pytorch_model\.bin$|resolve\/main\/tf_model\.h5$)"

cachedModels = {}
cachedTokenizers = {}
for file in metaFiles:
     with open(file) as j:
         data = json.load(j)
         isM = re.search(modelRegex, data['url'])
         if isM:
             cachedModels[isM.group(1)[:-1]] = file
         else:
             cachedTokenizers[data['url'].partition('huggingface.co/')[2]] = file

cachedTokenizers = OrderedDict(sorted(cachedTokenizers.items(), key=lambda k: k[0]))

Now all you have to do is to check the keys of cachedModels and cachedTokenizers and decide if you want to keep them or not. In case you want to delete them, just check for the value of the dictionary and delete the file from the cache. Don't forget to also delete the corresponding *.json and *.lock files.

Ita answered 27/11, 2020 at 16:45 Comment(0)

From a comment in transformers github issue, you can use the following way to find the cache directory so that you can clean it:

from transformers import file_utils
print(file_utils.default_cache_path)

Presley answered 1/10, 2023 at 4:35 Comment(0)

You can run this code to delete all models

from transformers import TRANSFORMERS_CACHE
print(TRANSFORMERS_CACHE)

import shutil
shutil.rmtree(TRANSFORMERS_CACHE)

Uropygium answered 14/2 at 15:19 Comment(0)

-4

pip uninstall tensorflow 
pip uninstall tensorflow-gpu
pip uninstall transformers

and find where you have saved gpt-2

model.save_pretrained("./english-gpt2") .???

english-gpt2 = your downloaded model name.

from that path you can manually delete.

Albumenize answered 27/11, 2020 at 14:13 Comment(7)

That is not what the OP is looking for as it will remove all libraries and does not clear the default cache. – Ita 27/11, 2020 at 16:48

As far as I have experienced, if you save it (huggingface-gpt-2 model, it is not on cache but on disk. Let me know your OS so that I can give you command accordingly. If this is Linux, with grep command, can me located easily. – Albumenize 27/11, 2020 at 20:46

I think that is some kind of misunderstanding. The OP (not me) wants to remove only certain models and not the whole transformers library. That's why I said that you are not answering the question of the OP. I also just tested what you have said and calling save_pretrained does not clear the cache (which is correct in my opinion). – Ita 27/11, 2020 at 21:50

As far as I remember cache is a part of RAM memory and models I guess would be stored on hardisk becuase they may not be permanently on RAM memory ? When needed they might be loaded into cache. But my aim is to remove from hardisk. I want to free some hardisk space by deleting some models which I dont use anymore. – Turin 28/11, 2020 at 7:1

@HiteshSomani Both answers will remove the models from your hard disk. The cache is just a term for intermediate storage which can be the RAM, or the processor, or the hard disk. Please check the Wikipedia article for further information. – Ita 28/11, 2020 at 15:31

Normally, when the huggingface model is downloaded, the configuration is so, that it gets saved on disk, but as temporary in temp or so, and when system restarts, it is washed away. But, if you have saved model, it depends on which path it is residing and that needs to be checked for the removal as the model is very big in size I guess. – Albumenize 29/11, 2020 at 7:54

@Albumenize this is not true. Please do: from transformers import TRANSFORMERS_CACHE and print(TRANSFORMERS_CACHE). Check the printed directory. It will be full of files when you have loaded models with from_pretrained before. This is the link to the official documentation confirming this beheaviour. – Ita 29/11, 2020 at 13:47

Recommended topics

Hot tags