git log --cherry-pick A..B - what am I doing wrong?

Asked 16/12, 2020 at 8:44 Answered 28/1, 2021 at 22:39

In order to find commits on branch working not merged/picked to master yet I'm running

git log --format="%h %aN %ai %f" --cherry-pick --no-merges master..working

as learned here.

But as described here I'm finding commits which are on master and working only varying in commit IDs due to cherry-picking.

Neither working has been merged to master nor vice versa.

Even when I manually cherry-pick a commit from working to master it will show up running the above command.

Acutally --cherry-pick seems to have no effect at all, as |wc show.

What am I doing wrong?

Update:

Actually both ElpieKay and torek are right and my initial command would have worked, too. Unfortunately I did not know we're using some "cherry-pickish" magic I did not know about which alters commits after cherry-picking

Alcheringa answered 16/12, 2020 at 8:44 Comment(4)

Try master...working. I'm afraid you missed one dot. – Marshall 16/12, 2020 at 9:31

A..B is shorthand for ^A B (see git-scm.com/docs/git-log) - ... seems to be for symmetrical diffs, ie it also shows commits on master which are not on working. Nevertheless it shows cherry-picked commits as well. – Alcheringa 16/12, 2020 at 10:37

@ElpieKay: strangely, when I try to come up with a reproducible minimal example, unsing ... cherry-picked commits don't show up, indeed. But unfortunately those which are on master only show up, too. Nevertheless in my actual project checkout cherry-picked commits do show up (using ...) even when git patch-id tells me, they are the same.. – Alcheringa 16/12, 2020 at 11:1

FWIW : I found I had a better view of what is marked as cherry-pickable or not using : git log --graph --cherry-mark --boundary --format="..." A...B (cherry-mark will still mention commits that are cherry-picked, and indicate them with a =, the --graph --boundary pair makes clearer what commit is on what side of the graph) – Helaine 16/12, 2020 at 11:12

As ElpieKay mentioned in a comment, you need the three-dot notation. However, just adding the three-dot notation is not sufficient: you will also want to add --left-only or --right-only (depending on which side of the symmetric difference you put the A and B parts, of A...B, on).

Note:

In order to find commits on branch working not merged/picked to master yet [I used] master..working

So here, you'd want --right-only master...working. You can keep --no-merges as well. If the merges only show up on master you don't actually need --no-merges, but it's probably harmless. Note, however, that --no-merges completely eliminates all merges, regardless of their patch-IDs.

Dar answered 16/12, 2020 at 11:8 Comment(0)

Before Git 2.30.1 (Q1 2021), when more than one commit with the same patch ID appears on one side, "git log --cherry-pick A...B"^(man) did not exclude them all when a commit with the same patch ID appears on the other side.
Now it does (again, with Git 2.31, Q1 2021).

See commit c9e3a4e (12 Jan 2021) by Jeff King (peff).
^{(Merged by Junio C Hamano -- gitster -- in commit b69bed2, 25 Jan 2021)}

patch-ids: handle duplicate hashmap entries

^{Reported-by: Arnaud Morin}
^{Signed-off-by: Jeff King}

This fixes a bug introduced in dfb7a1b ("patch-ids: stop using a hand-rolled hashmap implementation", 2016-07-29, Git v2.10.0-rc0 -- merge) in which
git rev-list --cherry-pick A...B
will fail to suppress commits reachable from A even if a commit with matching patch-id appears in B.

Around the time of that commit, the algorithm for "--cherry-pick" looked something like this: 0. Traverse all of the commits, marking them as being on the left or right side of the symmetric difference.

Iterate over the left-hand commits, inserting a patch-id struct for each into a hashmap, and pointing commit->util to the patch-id struct.

Iterate over the right-hand commits, checking which are present in the hashmap.
If so, we exclude the commit from the output and we mark the patch-id as "seen".

Iterate again over the left-hand commits, checking whether commit->util->seen is set; if so, exclude them from the output.

At the end, we'll have eliminated commits from both sides that have a matching patch-id on the other side.
But there's a subtle assumption here: for any given patch-id, we must have exactly one struct representing it.
If two commits from A both have the same patch-id and we allow duplicates in the hashmap, then we run into a problem:

a. In step 1, we insert two patch-id structs into the hashmap.

b. In step 2, our lookups will find only one of these structs, so only one "seen" flag is marked.

c. In step 3, one of the commits in A will have its commit->util->seen set, but the other will not. We'll erroneously output the latter.

Prior to dfb7a1b, our hashmap did not allow duplicates.
Afterwards, it used hashmap_add(), which explicitly does allow duplicates.

At that point, the solution would have been easy: when we are about to add a duplicate, skip doing so and return the existing entry which matches.
But it gets more complicated.

In 683f17e ("patch-ids: replace the seen indicator with a commit pointer", 2016-07-29, Git v2.10.0-rc0 -- merge), our step 3 goes away entirely.
Instead, in step 2, when the right-hand side finds a matching patch_id from the left-hand side, we can directly mark the left-hand patch_id->commit to be omitted.
Solving that would be easy, too; there's a one-to-many relationship of patch-ids to commits, so we just need to keep a list.

But there's more.
Commit b3dfeeb ("rebase: avoid computing unnecessary patch IDs", 2016-07-29, Git v2.10.0-rc0 -- merge) built on that by lazily computing the full patch-ids.
So we don't even know when adding to the hashmap whether two commits truly have the same id.
We'd have to tentatively assign them a list, and then possibly split them apart (possibly into N new structs) at the moment we compute the real patch-ids.
This could work, but it's complicated and error-prone.

Instead, let's accept that we may store duplicates, and teach the lookup side to be more clever.
Rather than asking for a single matching patch-id, it will need to iterate over all matching patch-ids.
This does mean examining every entry in a single hash bucket, but the worst-case for a hash lookup was already doing that.

We'll keep the hashmap details out of the caller by providing a simple iteration interface.
We can retain the simple has_commit_patch_id() interface for the other callers, but we'll simplify its return value into an integer, rather than returning the patch_id struct.
That way they won't be tempted to look at the "commit" field of the return value without iterating.

Renny answered 28/1, 2021 at 22:39 Comment(0)

`patch-ids`: handle duplicate hashmap entries

Recommended topics

Hot tags

patch-ids: handle duplicate hashmap entries

Recommended topics

Hot tags

`patch-ids`: handle duplicate hashmap entries