How to remove a dangling commit from GitHub?
Asked Answered
F

3

48

Yesterday, I pushed to my fork of ConnectBot on GitHub. I pushed once, realized that I hadn't made the change the way I wanted, redid the commit and pushed again.

Now, GitHub has both commits:

My master branch is only tracking the second commit, but the first commit is still available and is still in my activity feed. How can I remove it to make sure no one accidentally pulls that commit instead of the corrected version?

Fahy answered 6/12, 2010 at 15:32 Comment(4)
After 8 years the commits are both still there - accepted answer is obviously wrong..Tarango
The good news after all these years is that GitHub now have a virtual assistant that will create a ticket automatically for you. Very smooth process.Lichen
@Lichen from where could we create a ticket automatically?Excellence
@Excellence GitHub support.Lichen
I
12

Delete the repo or contact GitHub

Deleting the repo and recreating it without the bad commit seems to work if you can afford losing all issues. The data also disappears from the commit API (although push events are still visible). See also: https://mcmap.net/q/12313/-remove-sensitive-files-and-their-commits-from-git-history

If you can't afford to lose issue data, GitHub support can manually delete dangling commits. For example, when I uploaded all GitHub commit emails to a repo they asked me to take it down, so I did, and they did a gc. Pull requests that contain the data have to be deleted however: that repo data remained accessible up to one year after initial takedown due to this.

Their current help page says:

you can permanently remove all of your repository's cached views and pull requests on GitHub by contacting GitHub Support.

Maybe making the repo private will also keep the issues around and get rid of the commit, I'm not sure. You lose stars/forks for sure though. Not sure if after restore the commits will be gone or not. But at least you might be able to keep a private backup of issues.

Injector answered 29/9, 2015 at 9:24 Comment(1)
Note, you will also permanently lose all stars and forks if you make your repo private!Ilion
S
14

If you really need it to be removed immediately, you would probably have to contact GitHub Support.

Pulling should generate a pack that contains only objects that are referenced so no one should get that commit as a result of a clone or a pull. For example,

$ git clone git://github.com/nylen/connectbot.git
Cloning into connectbot...
remote: Counting objects: 6261, done.
remote: Compressing objects: 100% (1900/1900), done.
remote: Total 6261 (delta 3739), reused 5980 (delta 3520)
Receiving objects: 100% (6261/6261), 3.04 MiB | 3.40 MiB/s, done.
Resolving deltas: 100% (3739/3739), done.
$ git cat-file -t 1cd775d
fatal: Not a valid object name 1cd775d
Sanitarian answered 6/12, 2010 at 16:34 Comment(10)
After four years, both OP's commits are still available on Github.Blur
The old commit is still available by SHA, but it doesn't show up in the commit list.Okra
I'd be interested in whether it would show up elsewhere after all this time. For instance, do old commits which reference issues show up in the issue view forever?Okra
What's the basis for you saying that github will periodically garbage collect commits that aren't referenced? I've heard this said before but can't find anything more credible than answers like this.Alexipharmic
@Quantum7, Atlassian has a nice article on when gc is run. As I understand it, any git repo will check itself during a push and if things are 'too big' , gc will run. Secondly, FYI, I ran git gc and my dangling commits were not removed.Cupp
You need to contact GitHub support (as indicated here) to remove cached commits which are no longer referenced.Professionalism
In my case, I contacted github and they replied back that said they "cleared the cache on our end and run garbage collection" and I was able to confirm that the commits were no longer accessible.Vassar
@Okra Jira can find this dangling commits referencing tickets so they probably will show up there foreverShaunna
11 years now and OP's dangling commit is still available on Github ( 1cd775d )Calamus
After 12.5 years it's STILL there...Channelize
I
12

Delete the repo or contact GitHub

Deleting the repo and recreating it without the bad commit seems to work if you can afford losing all issues. The data also disappears from the commit API (although push events are still visible). See also: https://mcmap.net/q/12313/-remove-sensitive-files-and-their-commits-from-git-history

If you can't afford to lose issue data, GitHub support can manually delete dangling commits. For example, when I uploaded all GitHub commit emails to a repo they asked me to take it down, so I did, and they did a gc. Pull requests that contain the data have to be deleted however: that repo data remained accessible up to one year after initial takedown due to this.

Their current help page says:

you can permanently remove all of your repository's cached views and pull requests on GitHub by contacting GitHub Support.

Maybe making the repo private will also keep the issues around and get rid of the commit, I'm not sure. You lose stars/forks for sure though. Not sure if after restore the commits will be gone or not. But at least you might be able to keep a private backup of issues.

Injector answered 29/9, 2015 at 9:24 Comment(1)
Note, you will also permanently lose all stars and forks if you make your repo private!Ilion
T
0

How can I remove it to make sure no one accidentally pulls that commit instead of the corrected version?

There's no need to do this, anybody using the master branch on your repo will get the correct commit (i.e., whatever you happen to have master branch pointing to at the time).

The reason that the other commit hasn't been garbage-collected is because there's still a reference to it somewhere.

In local repos this is usually the reflog, and the commit will be GC'd once it gets old enough that the reflog entry that indicates that HEAD and/or master pointed to that commit sometime in the past ages out and is GC'd, or is explicitly deleted.

GitHub is a bit more complex because there are plenty of things outside of a particular repo that can reference commits in a repo. This includes PRs, issues, and apparently even references in other repos, as the current message at the top of https://github.com/nylen/connectbot/commit/1cd775d indicates:

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

I tried to track down where this reference might be by checking issues and PRs in the upstream repo and another fork repo mentioned in a PR for the "good" commit above. I didn't manage to track it down, but the upstream repo currently has 624 active forks, each with its own set of commits, PRs and issues (and whatever else GitHub has that references commits), so the reference is no doubt in there somewhere.

But again, there's no need to worry about this. Anybody looking at your master branch will always get the "correct" commit as of the time they last fetched your repo on the tracking branch, and they'll have to resolve things in the usual way if they happen to have a local branch that they made reference the older version of that commit. (In situations like this one, usually a simple git rebase will "replace" the old commit with the new one on the local branch.)

Tamtama answered 4/2 at 3:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.