GitPython: How can I access the contents of a file in a commit in GitPython
Asked Answered
L

1

10

I am new to GitPython and I am trying to get the content of a file within a commit. I am able to get each file from a specific commit, but I am getting an error each time I run the command. Now, I know that the file exist in GitPython, but each time I run my program, I am getting the following error:

 returned non-zero exit status 1

I am using Python 2.7.6 and Ubuntu Linux 14.04.

I know that the file exist, since I also go directly into Git from the command line, check out the respective commit, search for the file, and find it. I also run the cat command on it, and the file contents are displayed. Many times when the error shows up, it says that the file in question does not exist. I am trying to go through each commit with GitPython, get every blob or file from each individual commit, and run an external Java program on the content of that file. The Java program is designed to return a string to Python. To capture the string returned from my Java code, I am also using subprocess.check_output. Any help will be greatly appreciated.

I tried passing in the command as a list:

cmd = ['java', '-classpath', '/home/rahkeemg/workspace/CSCI499_Java/bin/:/usr/local/lib/*:', 'java_gram.mainJava','absolute/path/to/file']
subprocess.check_output(cmd, stderr=subprocess.STDOUT, shell=False)

And I have also tried passing the command as a string:

subprocess.check_output('java -classpath /home/rahkeemg/workspace/CSCI499_Java/bin/:/usr/local/lib/*: java_gram.mainJava {file}'.format(file=entry.abspath.strip()), shell=True)

Is it possible to access the contents of a file from GitPython? For example, say there is a commit and it has one file foo.java In that file is the following lines of code:

foo.java

import java.io.FileInputStream;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;

public class foo{
    public static void main(String[] args) throws Exception{}
}

I want to access everything in the file and run an external program on it. Any help would be greatly appreciated. Below is a piece of the code I am using to do so

 #! usr/bin/env python

 __author__ = 'rahkeemg'

 from git import *
 import git, json, subprocess, re

 
 git_dir = '/home/rahkeemg/Documents/GitRepositories/WhereHows'


 # make an instance of the repository from specified path
 repo = Repo(path=git_dir)

 heads = repo.heads  # obtain the different repositories
 master = heads.master  # get the master repository

 print master
 
 # get all of the commits on the master branch
 commits = list(repo.iter_commits(master))

 cmd = ['java', '-classpath', '/home/rahkeemg/workspace/CSCI499_Java/bin/:/usr/local/lib/*:', 'java_gram.mainJava']

 # start at the very 1st commit, or start at commit 0
 for i in range(len(commits) - 1, 0, -1):
     commit = commits[i]
     commit_num = len(commits) - 1 - i
     print commit_num, ": ", commit.hexsha, '\n', commit.message, '\n'

     for entry in commit.tree.traverse():
         if re.search(r'\.java', entry.path):
                             
            current_file = str(entry.abspath.strip())
            
            # add the current file or blob to the list for the command to run
            cmd.append(current_file) 
            print entry.abspath

            try:
             
                # This is the scenario where I pass arguments into command as a string
                print subprocess.check_output('java -classpath /home/rahkeemg/workspace/CSCI499_Java/bin/:/usr/local/lib/*: java_gram.mainJava {file}'.format(file=entry.abspath.strip()), shell=True)
            
  
                # scenario where I pass arguments into command as a list
                j_response = subprocess.check_output(cmd, stderr=subprocess.STDOUT, shell=False)
            
            except subprocess.CalledProcessError as e:
                 print "Error on file: ", current_file
         
            # Use pop on list to remove the last string, which is the selected file at the moment, to make place for the next file.  
            cmd.pop()
Laniferous answered 5/4, 2016 at 14:37 Comment(3)
"I also go directly into git from the command line, check out the respective commit, search for the file, and find it" - Your problem is that as you iterate over the commits no checkout is performed by GitPython.Weisler
Tip 1: since Python is magic, you can do for commit_num, commit in reversed(list(enumerate(commits))): instead of using i, len, commits[i] and so on...Gonidium
Tip 2: re.search(r'\.java', entry.path) will also match string like 'my.name.is.java.txt' which doesn't have to be a Java source file at all. Why not simply do entry.path.endswith('.java')?Gonidium
G
16

First of all, when you traverse the commit history like this, the file will not be checked out. All you get is the filename, maybe leading to the file or maybe not, but certainly it will not lead to the file from different revision than currently checked-out.

However, there is a solution to this. Remember that in principle, anything you could do with some git command, you can do with GitPython.

To get file contents from specific revision, you can do the following, which I've taken from that page:

git show <treeish>:<file>

therefore, in GitPython:

file_contents = repo.git.show('{}:{}'.format(commit.hexsha, entry.path))

However, that still wouldn't make the file appear on disk. If you need some real path for the file, you can use tempfile:

f = tempfile.NamedTemporaryFile(delete=False)
f.write(file_contents)
f.close()

# at this point file with name f.name contains contents of
#   the file from path entry.path at revision commit.hexsha
# your program launch goes here, use f.name as filename to be read

os.unlink(f.name) # delete the temp file
Gonidium answered 29/5, 2016 at 15:26 Comment(1)
I am using version 3.x. Following code worked for me. repo.git.execute(["git", "show", f"{commit.hexsha}:{file}"])Clamorous

© 2022 - 2024 — McMap. All rights reserved.