How do Git LFS and git-annex differ?
Asked Answered
F

2

82

git-annex has been around for quite some time, but never really gained momentum.
Git LFS is rather young and is already supported by GitHub, Bitbucket and GitLab.

Both tools handle binary files in git repositories. On the other hand, GitLab seems to have replaced git-annex with Git LFS within one year.

  • What are the technical differences?
  • Do they solve the same problem?
Felafel answered 5/9, 2016 at 20:52 Comment(1)
Here's quite nice article about the two: Large files with Git: LFS and git-annex (LWN.net)Means
G
80

They do solve the same problem.

Let me start off with pro/con, then I'll move into technical differences.

git-annex

Pros:

  • Supports multiple remotes that you can store the binaries.
  • Can be used without support from hosting provider (for more details see here).

Cons:

  • Windows support in beta, and has been for a long time
  • Users need to learn separate commands for day-to-day work
  • not supported by github and bitbucket

git-lfs

Pros:

  • Supported by github, bitbucket and gitlab
  • Most supported on all os's
  • Easy to use.
  • automated based on filters

Cons:

Technical

git-annex

git-annex works by creating a symlink in your repo that gets committed. The actual data gets stored into a separate backend (S3, rsync, and MANY others). It is written in haskell. Since it uses symlinks, windows users are forced to use annex in a much different manner, which makes the learning curve higher.

git-lfs

Pointer files are written. A git-lfs api is used to write the BLOBs to lfs. A special LFS server is required due to this. Git lfs uses filters so you only have to set up lfs once, and again when you want to specify which types of files you want to push to lfs.

Gallaway answered 5/9, 2016 at 22:22 Comment(9)
Great summary! I have two more questions. Do Windows users of git-annex lose some of the functionality of git-annex? Can there be several LFS servers (comparable to multiple backends in git-annex)?Felafel
There could be, LFS works a LOT like the actual git servers work. You would simply add another remote and push the branch to both remotes.Gallaway
“automated based on filters” It appears git annex can do the same too: git-annex.branchable.com/tips/largefilesItagaki
I welcome any modification to this list if anybody would like to add pros/cons.Gallaway
Was all set to use git-lfs (as I use github to host my repos currently) and then found out the pricing structures for those providers is different for LFS repos. Would probably need to pay at least $5-10 p/month for a repository with any file you could deem large in it (although might be able to do something with GitLab's free 10GB). Not a deal breaker for industry users but typically not suitable for research software that is meant to be published indefinitely.Barbera
@TomClose: if it is research software, you better get your library to help you for long term storage. No other institution can give you a 10 years guarantee you need. Zenodo is another option if your library cannot do much.Salmons
@JulienColomb Yes, either a university run Gitlab instance is probably the best option. However, I wasn't keen to move away from GitHub, and since the large files are only required for running tests I have ended up using git-annex with a special remote back to my uni's infrastructureBarbera
@TomClose sounds great, you can still use github-zenodo integration, although I do not know how it would work with git-annex. would love to have a look, do you have a link to the repo?Salmons
@JulienColomb The repo is github.com/MonashBI/banana, the git-annex integration is still a WIP though. The plan is to hook it up to CircleCI and get a container there to pull from my special remote (at this stage just my uni GDrive but once I get it working I plan to move it to dedicated research infrastructure) before running the testsBarbera
D
53

A major advantage of git annex is that you can choose which file you want to download.

You still know which files are available thanks to the symlinks.

For example suppose that you have a directory full of ISO files. You can list the files, then decide which one you want to download by typing: git annex get my_file.

Another advantage is that the files are not duplicated in your checkout. With LFS, lfs files are present as git objects both in .git/lfs/objects and in your working repository. So If you have 20 GB of LFS files, you need 40 GB on your disk. While with git annex, files are symlinked so in this case only 20 GB is required.

Dreadnought answered 7/4, 2017 at 11:32 Comment(4)
Thanks for this answer! I'm still trying to understand things, but wouldn't GVFS (github.com/Microsoft/gvfs) paired with git/git-lfs address the issue of downloading individual files. From their readme... "GVFS virtualizes the file system beneath your git repo so that git and all tools see what appears to be a normal repo, but GVFS only downloads objects as they are needed"Humility
GVFS seems to be windows only for the moment.Dreadnought
Thanks for mentioning the data duplication of LFS, I didn't see that mentioned anywhere else. I'd rather not duplicate my media directory disk usage for no good reason.Snappy
I found this article (writequit.org/articles/getting-started-with-git-annex.html) by Lee Hinman useful in understanding Karl Forner's answer because it clearly separates two workflows that you can use with git-annex: 1) tracking file metadata without moving large files, and 2) moving and copying large files.Thunell

© 2022 - 2024 — McMap. All rights reserved.