Remove git-annex repository from file tree
Asked Answered
W

4

14

I tried installing git-annex yesterday to backup my files. I ran git annex add . in the root of my repository tree and then a git commit. So far everything is fine.

What I didn't know git-annex was doing was turning my entire file tree into a whole bunch of symlinks. Every single file in my whole tree is now symlinked into .git/annex/objects! This is messing up my application which depends on files not being symlinks.

My question is, how do I get rid of git-annex and restore my file system to its original state? For a normal git repo I could do rm -r .git, but I'm afraid that won't do the job in git-annex. Thanks in advance.

Wilhelmstrasse answered 27/6, 2014 at 8:13 Comment(1)
a little bit late to the party but in case you need hardlinks instead of soft ones, git annex supports a "git config annex.hardlink" option ...Autochthonous
W
10

Okay, so I stumbled upon some docs for git-annex, and they give two commands that achieve what I wanted to do:

unannex [path ...]

Use this to undo an accidental git annex add command. You can use git annex unannex to move content out of the annex at any point, even if you've already committed it. This is not the command you should use if you intentionally annexed a file and don't want its contents any more. In that case you should use git annex drop instead, and you can also git rm the file.

uninit

Use this to stop using git annex. It will unannex every file in the repository, and remove all of git-annex's other data, leaving you with a git repository plus the previously annexed files.

I started running git annex uninit, but my god was it slow. It took about 5 minutes to "unannex" just a single file. My filesystem tree is about 200,000 files, so that was just unacceptable.

What I ended up doing was actually surprisingly simple and worked well. I used the cp -rL flags to automatically duplicate the contents of my file tree and reverse all symlinks in the duplicate copy. And it was blazing fast: around 30 seconds for my entire file tree. Only problem was that the file permissions were not retained from my original state, so I needed to run some chmod and chcon commands to fix up the permissions.

This second method worked for me because there were no other symlinks in my schema. If you do have symlinks in your schema beyond those created by git-annex, then my little shortcut probably isn't the right choice for you, and you should consider sticking with just git annex uninit.

Wilhelmstrasse answered 28/6, 2014 at 6:39 Comment(1)
I ran into the same problem today (preemptively: just to make sure I can escape from annex and/or get myself out of trouble) and ended up using cp -rLlp annex copy -- that 1. hard links to go faster and avoid using any disk space for the copied files 2. preserves permissions, ownership, and timestamps. I describe a second solution here also, that preserves non-annexed or untracked symlinks: superuser.com/a/1391272/116912Upstart
W
4

If you have a v6 repository, you can do the following:

git unnannex . --fast

which replaces the symlinks w/ hardlinks instead of slowly replacing the symlinks with the original files again.

Only v6 repositories can execute the git-annex unannex command on uncommited changes, so it could be necessary to upgrade the git-annex repo to a v6 repository.

See the Official Upgrade Guide.

In my case I had to upgrade v5 -> v6 and I only had to execute git annex upgrade which took a few seconds and I was done.

Wheelbarrow answered 16/12, 2018 at 9:10 Comment(0)
L
3

I would like to include my own experience of using git annex uninit, in addition to OP's answer.

I didn't have full repository annexed, but only about 40 bigger files. After deciding that I have no particular benefit of using git-annex, I tried unannexing several files and it was over in several seconds per file. Then, I ran git annex uninit and it took more than a minute only for really huge files (more than few GB). Overall, it was done in about 20 minutes, which was acceptable in my case.

So, it seems that the complexity of unannexing increases with the size of annexed file tree.

Landlubber answered 14/7, 2014 at 12:13 Comment(0)
T
2

Have you tried to use git-annex in direct mode?

Just change your repository with

git annex direct

This will not use symlinks any longer, but some git commands do not work with such annex repositories. Check out the explanations on their website to see if this scheme fits your purposes.

Maybe the conversion process is faster then the previous mentioned tips. I haven't tried it by myself with big repositories.

Tifanytiff answered 12/4, 2015 at 14:43 Comment(2)
in v6, you will need to use git annex unlock while direct mode is still available, it's deprecated.Yeager
And in v7, there is no direct mode any more. Interestingly I just noticed that git annex unlock . on the root of my (v7) test repo uses cp -a --reflink=auto, which will try to use copy-on-write. I think that will only actually give copy on write behaviour on btrfs at the moment, so for everything else it will do real copies rather than CoW or hard linking (somewhat slow, but likely much faster than git annex uninit or git annex unannex .).Upstart

© 2022 - 2024 — McMap. All rights reserved.