Can you convert an existing git repository to be a "blobless" one?
Asked Answered
H

3

11

Today, git offers "partial clone" options that enable downloading the commits and trees of a repository, while allowing blobs to be downloaded on-demand, saving network bandwidth and disk space.

This can be enabled during the initial git clone by passing --filter=blob:none. However, is there a way to convert an already existing local repository to the "blobless" format? This should save some disk space by deleting any local blobs known to be available from the "promisor" remote.

Hibbitts answered 17/8, 2021 at 4:58 Comment(4)
Would locally cloning be an option for you? (e.g. git clone --filter=blob:none /path/to/full/repo.git /path/to/blobless.git)Peril
@Peril I think that would result in the original path being the promisor remote for the newly cloned one, so I couldn't delete the original? Or maybe I could fix that up after the fact.Hibbitts
Nathan, afterwards, change the URL of the origin remote repository to point to the original one.Peril
There's no existing, user-oriented, convenient way. The method @Peril outlined will work. Be aware that partial clone itself is still not really ready for The Masses to use: you'll run into sharp edges now and then.Offertory
M
2

While I'm not aware of a dedicated in-place command, you can still perform a local cloning and then replace your original folder with the blobless copy.
(solution inspired by knittl comment)

# If it's your first time, you'd need to enable filtering locally.
git config --global uploadpack.allowFilter true

# Filter your existing repo into a new one.
# The `file://` protocol followed by a full path is required.
git clone --filter=blob:none file:///full_path_to_existing_repo/.git path_to_new_blobless_copy

# Reconfigure the origin of your new repo.
# You can retrieve it with `git remote -v` in your existing repo.
cd path_to_new_blobless_copy
git remote set-url origin remote_path_to_origin.git
cd -

# Replace your existing repo with the new one.
# Destructive operation that will free up the space of the blobs.
# But will also destroy your local stashes, branches and tags that you didn't clone!
rm -rf /full_path_to_existing_repo
mv path_to_new_blobless_copy /full_path_to_existing_repo
Moises answered 26/11, 2021 at 11:51 Comment(0)
V
2

I believe the only difference in the .git/config between a clone with and without --filter=blob:none is the following configuration:

...
[remote "origin"] # or whatever your remote is named
promisor = true
partialclonefilter = blob:none
...

Which you can change with these commands:

# change <origin> to the name of your remote
git config remote.origin.promisor true
git config remote.origin.partialclonefilter blob:none

According to the docs changing this only affects fetches of new commits, but I believe git gc --prune=now should clean the unnecessary objects.

Valeric answered 18/4, 2023 at 9:36 Comment(2)
Didn't work. Using [email protected]:baumgarr/nixnote2.git as a test repo, du -sk . actually INCREASED from 75428 to 75500. 😥 --aggressive only reduced it to 67832, whereas a true blobless clone was 13612.Tirrell
I wonder what would happen if one unpacked everything and deleted all the blobs (maybe trees too?). I didn't compare space savings, but my client redownloaded branch blobs after I used --aggressive.Lists
T
1

First ensure that all blobs in the repo exist on the remote by pushing any commits.

upstream=origin
gitdir=$(realpath $(git rev-parse --git-dir))

mv "$gitdir"/objects "$gitdir"/objects.bak  # Create backup
mkdir "$gitdir"/objects  # Required else git thinks its not a repository

# Tell git this repo is blobless
git config remote."$upstream".promisor true
git config remote."$upstream".partialclonefilter blob:none

# Fetch branches, trees, and tags
git fetch --refetch --tags --no-auto-gc 

# Now fetch the objects reachable from HEAD:
git fetch-pack --refetch --keep --quiet [email protected]:user/repo.git  $(git rev-list --missing=allow-promisor --objects HEAD | cut -c 1-40) > /dev/null

git fsck --full && echo "Remember to: rm -rf "$gitdir"/objects.bak"

Regarding local stashes and tags (if you have any, the final git fsck --full will complain if you do):

  • Create a pack of local-only objects using git-pack-objects, then restore this single manually-created pack

  • Get hashes of stashes:

    git for-each-ref refs/stash --format='%(objectname)'
    
  • Get hashes of annotated tags:

    git for-each-ref --format="%(if:equals=tag)%(objecttype)%(then)%(objectname)%(else)%(end)" --sort=taggerdate refs/tags
    
  • Make the above even smaller by getting the objects in X but not in the parent

    git rev-list --objects 68268d5 '^68268d5^^'
    

    (The first ^ means "not" and the ^^ means parent)

Thanks to Oded Niv's answer for the config tips.

Tirrell answered 9/8, 2023 at 11:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.