Checking if an object is in a repo in gitpython
Asked Answered
C

4

5

I'm working on a program that will be adding and updating files in a git repo. Since I can't be sure if a file that I am working with is currently in the repo, I need to check its existence - an action that seems to be harder than I thought it would be.

The 'in' comparison doesn't seem to work on non-root levels on trees in gitpython. Ex.

>>> repo = Repo(path)
>>> hct = repo.head.commit.tree
>>>> 'A' in hct['documents']
False
>>> hct['documents']['A']
<git.Tree "8c74cba527a814a3700a96d8b168715684013857">

So I'm left to wonder, how do people check that a given file is in a git tree before trying to work on it? Trying to access an object for a file that is not in the tree will throw a KeyError, so I can do try-catches. But that feels like a poor use of exception handling for a routine existence check.

Have I missed something really obvious? How does once check for the existence of a file in a commit tree using gitpython (or really any library/method in Python)?

Self Answer

OK, I dug around in the Tree class to see what __contains__ does. Turns out, when searching in sub folders, one has to check for existence of a file using the full relative path from the repo's root. So a working version of the check I did above is:

>>> 'documents/A' in hct['documents']
True
Chine answered 5/5, 2012 at 22:17 Comment(1)
You can add your own answer and accept it - it will be more prominent for other users like that.Schluter
F
6

EricP's answer has a bug. Here's a fixed version:

def fileInRepo(repo, filePath):
    '''
    repo is a gitPython Repo object
    filePath is the full path to the file from the repository root
    returns true if file is found in the repo at the specified path, false otherwise
    '''
    pathdir = os.path.dirname(filePath)

    # Build up reference to desired repo path
    rsub = repo.head.commit.tree

    for path_element in pathdir.split(os.path.sep):

        # If dir on file path is not in repo, neither is file. 
        try : 
            rsub = rsub[path_element]

        except KeyError : 

            return False

    return(filePath in rsub)

Usage:

file_found = fileInRepo(repo, 'documents/A')

This is very similar to EricP's code, but handles the case where the folder containing the file is not in the repo. EricP's function raises a KeyError in that case. This function returns False.

(I offered to edit EricP's code but was rejected.)

Frontlet answered 21/9, 2014 at 16:7 Comment(1)
git.Tree.join effectively already does what this function fileInRepo is re-implementing. I think better to use the implementation already in git.Tree.join. I'd added another answer using git.Tree.join.Ossification
G
3

Expanding on Bill's solution, here is a function that determines whether a file is in a repo:

def fileInRepo(repo,path_to_file):
    '''
    repo is a gitPython Repo object
    path_to_file is the full path to the file from the repository root
    returns true if file is found in the repo at the specified path, false otherwise
    '''
    pathdir = os.path.dirname(path_to_file)

    # Build up reference to desired repo path
    rsub = repo.head.commit.tree
    for path_element in pathdir.split(os.path.sep):
        rsub = rsub[path_element]
    return(path_to_file in rsub)

Example usage:

file_found = fileInRepo(repo, 'documents/A')
Guinness answered 16/1, 2013 at 17:46 Comment(0)
O
2

There already exists a method of Tree that will do what fileInRepo re-implements in Lucidity's answer . The method is Tree.join:

https://gitpython.readthedocs.io/en/3.1.29/reference.html#git.objects.tree.Tree.join

A less redundant implementation of fileInRepo is:

def fileInRepo(repo, filePath):
    try:
        repo.head.commit.tree.join(filePath)
        return True
    except KeyError:
        return False
Ossification answered 28/10, 2022 at 12:1 Comment(2)
This is a better answer. I needed to pass in a pathlib.Path, though, so I used repo.head.commit.tree.join(filePath.relative_to(repo.working_dir)).Spaceband
If Windows, may need git_path = str(pathlib.PurePosixPath(filePath.relative_to(repo.working_dir)))Spaceband
S
1

If you want to omit catch try you can check if object is in repo with:

def fileInRepo(repo, path_to_file):
    dir_path = os.path.dirname(path_to_file)
    rsub = repo.head.commit.tree
    path_elements = dir_path.split(os.path.sep)
    for el_id, element in enumerate(path_elements):
        sub_path = os.path.join(*path_elements[:el_id + 1])
        if sub_path in rsub:
            rsub = rsub[element]
        else:
            return False
    return path_to_file in rsub

or you can iterate through all items in repo, but it will be for sure slower:

def isFileInRepo(repo, path_to_file):
    rsub = repo.head.commit.tree
    for element in rsub.traverse():
        if element.path == path_to_file:
            return True
    return False
Sclerenchyma answered 14/6, 2020 at 22:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.