This will let you have a git repo "myrepo" with related S3 bucket that holds all of the big files you don't really want in your git repository.
Set up the repo:
# Clone your repo "myrepo"
git clone [email protected]:me/myrepo.git
cd myrepo
# Initialize it to work with git-annex.
# This creates .git/annex directory in the repo,
# and a `git-annex` metadata branch the tools use behind the scenes.
git annex init
# The first time you use the repo with git-annex someone must link it to S3.
# Be sure to have AWS_* env vars set.
# Select a name that is fitting to be a top-level bucket name.
# This creates the bucket s3://myrepo-annexfiles-SOME_UUID.
git annex initremote myrepo-annexfiles type=S3
# Save the repo updates related to attaching your git annex remote.
# Warning: this does a commit and push to origin of this branch plus git-annex.
# It will ALSO grab other things so make sure you have committed
# or stashed those to keep them out of the commit.
git annex sync
Add some files to the annex:
# These examples are small for demo.
mkdir mybigfiles
cd mybigfiles
echo 123 > file1
echo 456 > file2
# This is the alternative to `git add`
# It replaces the files with symlinks into .git/annex/.../SOME_SHA256.
# It also does `git add` on the symlinks, but not the targets.
git annex add file*
# Look at the symlinks with wonder.
ls -l mybigfiles/file*
# This puts the content into S3 by SHA256 under the attached to your "special remote":
git annex move file* --to myrepo-annexfiles
# Again, this will do a lot of committing and pushing so be prepared.
git annex sync
With git-annex
the git repo will just have dead symlinks that contain a SHA256 value for the real file content, and the tooling will bring down the big files.
Later, when someone else clones the repo and wants the files:
git clone myrepo
cd myrepo
# Enable access to the S3 annex files.
# NOTE: This will put out a warning about ssh because the origin above is ssh.
# This is ONLY telling you that it can't push the big annex files there.
# In this example we are using git-annex specifically to ensure that.
# It is good that it has configured your origin to NOT participate here.
git annex enableremote myrepo-annexfiles
# Get all of the file content from S3:
git annex get mybigfiles/*
When done with the files, get your disk space back:
git annex drop mybigfiles/*
Check to see where everything really lives, and what is really downloaded where:
git annex whereis mybigfiles/file*
Note that git-annex is a super flexible tool. I found that distilling down a simpler recipe for the common case required a bit of study of the docs. Hope this helps others.