List the content of a directory for a specific git commit using GitPython
Asked Answered
C

3

6

Using GitPython, I'm trying to list the content of a directory at a given commit (i.e. a "snapshot" of the directory at the time).

In the terminal, what I'd do is:

git ls-tree --name-only 4b645551aa82ec55d1794d0bae039dd28e6c5704

How can I do the same in GitPyhon?

Based on the answers I've found to a similar question (GitPython get tree and blob object by sha) I've tried recursively traversing base_commit.tree and its .trees, but I don't seem to get anywhere.

Any ideas?

Commencement answered 14/7, 2017 at 12:24 Comment(0)
S
4

Indeed, traversing the trees/subtrees is the right approach. However, the built in traverse method can have issues with Submodules. Instead, we can do the traversal ourselves iteratively and find all the blob objects (which contain the files in our repo at a given commit). There's no need to use execute.

def list_files_in_commit(commit):
    """
    Lists all the files in a repo at a given commit

    :param commit: A gitpython Commit object
    """
    file_list = []
    dir_list = []
    stack = [commit.tree]
    while len(stack) > 0:
        tree = stack.pop()
        # enumerate blobs (files) at this level
        for b in tree.blobs:
            file_list.append(b.path)
        for subtree in tree.trees:
            stack.append(subtree)
    # you can return dir_list if you want directories too
    return file_list

If you want the files affected by a given commit, this is available via commit.stats.files.

Schlemiel answered 6/6, 2019 at 0:14 Comment(0)
C
1

I couldn't find a more elegant way than actually calling execute. This is the end result:

configFiles = repo.git.execute(
    ['git', 'ls-tree', '--name-only', commit.hexsha, path]).split()

where commit is a git.Commit object and path is the path I'm interested in.

Commencement answered 26/7, 2017 at 14:57 Comment(0)
P
1

If you know the path to the directory, let's say it is foo/bar/baz and you have a GitPython Commit object, let's call it commit then you can access the blobs in the directory like so commit.tree['foo']['bar']['baz'].blobs and then get the individual blob (file) names to come up with your list of files in that directory at the commit point in time.

import git

repo = git.Repo('path/to/my/repo')
commit = next(repo.iter_commits(max_count=1))
files_in_dir = [b.name for b in commit.tree['foo']['bar']['baz'].blobs]
Planimeter answered 3/12, 2020 at 23:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.