Get the diff details of first commit in GitPython
Asked Answered
R

2

9

In GitPython, I can iterate separately the diff information for every change in the tree by calling the diff() method between different commit objects. If I call diff() with the create_patch=True keyword argument, a patch string is created for every change (additions, deletions, renames) which I can access through the created diff object, and dissect for the changes.

However, I don't have a parent to compare to with the first commit.

import git
from git.compat import defenc
repo = git.Repo("path_to_my_repo")

commits = list(repo.iter_commits('master'))
commits.reverse()

for i in commits:

    if not i.parents:
        # First commit, don't know what to do
        continue
    else:
        # Has a parent
        diff = i.diff(i.parents[0], create_patch=True)

    for k in diff:
        try:
            # Get the patch message
            msg = k.diff.decode(defenc)
            print(msg)
        except UnicodeDecodeError:
            continue

You can use the method

diff = repo.git.diff_tree(i.hexsha, '--', root=True)

But this calls git diff on the whole tree with the given arguments, returns a string and I cannot get the information for every file separately.

Maybe, there is a way to create a root object of some sorts. How can I get the first changes in a repository?

EDIT

A dirty workaround seems to be comparing to the empty tree by directly using its hash:

EMPTY_TREE_SHA = "4b825dc642cb6eb9a060e54bf8d69288fbee4904"

....

    if not i.parents:
        diff = i.diff(EMPTY_TREE_SHA, create_patch=True, **diffArgs)
    else:
        diff = i.diff(i.parents[0], create_patch=True, **diffArgs)

But this hardly seems like a real solution. Other answers are still welcome.

Ruddle answered 25/11, 2015 at 12:33 Comment(1)
As opposed to native git, GitPython does not have special handling for the first commit. Therefore, your solution seems like a very viable one.Penitentiary
B
6

The short answer is you can't. GitPython does not seem to support this method.

It would work to do a git show on the commit, but GitPython does not support that.

You can on the other hand use the stats functionality in GitPython to get something that will allow you to get the information you need:

import git

repo = git.Repo(".")

commits = list(repo.iter_commits('master'))
commits.reverse()
print(commits[0])
print(commits[0].stats.total)
print(commits[0].stats.files)

This might solve your problem. If this does not solve your problem you would probably be better off trying to use pygit2 which is based on libgit2 - The library that VSTS, Bitbucket and GitHub use to handle Git on their backends. That is probably more feature complete. Good luck.

Braggadocio answered 4/4, 2018 at 10:28 Comment(0)
S
2

the proposed solution of the OP works, but it has the disadvantage that the diff is inverse (added files in the diff are marked as delete, etc). However, one can simply reverse the logic:

from gitdb.util import to_bin_sha
empty_tree = git.Tree(self.repo, to_bin_sha("4b825dc642cb6eb9a060e54bf8d69288fbee4904"))
diff = empty_tree.diff(i)

Be aware that with sha256, the empty tree id is 6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321

You can check the type of the repo with GitPython like so:

def is_sha1(repo):
    format = repo.git.rev_parse("--show-object-format")
    return format == "sha1"
Showbread answered 19/7, 2022 at 8:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.