Programmatically `git checkout .` with dulwich
Asked Answered
L

4

5

Having this code

from dulwich.objects import Blob, Tree, Commit, parse_timezone
from dulwich.repo import Repo
from time import time

repo = Repo.init("myrepo", mkdir=True)
blob = Blob.from_string("my file content\n")
tree = Tree()
tree.add("spam", 0100644, blob.id)
commit = Commit()
commit.tree = tree.id


author = "Flav <[email protected]>"
commit.author = commit.committer = author
commit.commit_time = commit.author_time = int(time())
tz = parse_timezone('+0200')[0]
commit.commit_timezone = commit.author_timezone = tz
commit.encoding = "UTF-8"
commit.message = "initial commit"

o_sto = repo.object_store
o_sto.add_object(blob)
o_sto.add_object(tree)
o_sto.add_object(commit)

repo.refs["HEAD"] = commit.id

I end up with the commit in the history, BUT the created file is pending for deletion (git status says so).

A git checkout . fixes it.

My question is: how to do git checkout . programmatically with dulwich?

Liponis answered 10/7, 2011 at 10:28 Comment(2)
your code doesn't set HEAD to the current commit, fixed itEustache
Yeah it was there, but my copy/paste skills truncated the code :|Liponis
E
3

It is now possible since release 0.8.4, with the method dulwich.index.build_index_from_tree().

It writes a tree to both the index file and the filesystem (working copy), which is a very basic form of checkout.

See the note

existing index is wiped and contents are not merged in a working dir. Suiteable only for fresh clones

I could get it work with the following code

from dulwich import index, repo
#get repository object of current directory
repo = repo.Repo('.')
indexfile = repo.index_path()
#we want to checkout HEAD
tree = repo["HEAD"].tree

index.build_index_from_tree(repo.path, indexfile, repo.object_store, tree)
Eustache answered 17/9, 2012 at 18:33 Comment(0)
E
9

Git status says it's deleted because the file doesn't exist in the working copy, that's why checking it out fixes the status.

It looks like there's no support for high-level working copy classes and functions in dulwich yet. You'd have to deal with trees and blobs and unpacking objects.

OK, took the challenge: I could make a basic checkout with Dulwich :

#get repository object of current directory
repo = Repo('.')
#get tree corresponding to the head commit
tree_id = repo["HEAD"].tree
#iterate over tree content, giving path and blob sha.
for entry in repo.object_store.iter_tree_contents(tree_id):
  path = entry.in_path(repo.path).path
  dulwich.file.ensure_dir_exists(os.path.split(path)[0])
  with open(path, 'wb') as file:
    #write blob's content to file
    file.write(repo[entry.sha].as_raw_string()) 

It won't delete files that must be deleted, won't care about your index, etc.
See also Mark Mikofski's github project for more complete code based on this.

Eustache answered 10/7, 2011 at 11:20 Comment(11)
+1 for using with open ... instead of f.close()! Also you can add in_path(<path>) to entry.path which will append <path> to the TreeEntry named tuple. see dulwich API docImpersonality
I will if I come up with anything better. FYI d.repo.BaseRepo.get_blob(sha) raises NotBlob error, instead of get_object, otherwise it's exactly the same. Also d.file.ensure_dir_exists(os.path.split(entry.in_path(repo.path).path)[0]) does a nice job of making your directories, if they don't already exist. Finally d.GitFile(path, mode) does the same thing as file. Do you know what the difference between as_raw_string and as_pretty_string is? They seem the same. I started a dulwich porcelain repo for more of these snippets on github.Impersonality
this doesn't set the mode, so git status still says deleted or untracked, so use chmod entry.mode entry.in_path(repo.path).path. Just one thing, not sure about "The file mode is like the octal argument you could give to the chmod command. Except it is in extended form to tell regular files from directories and other types." dulwich introduction: the treeImpersonality
ignore file modeImpersonality
last comment I swear, Blob.data is same as as_raw_string() d.o.BlobImpersonality
your edit suggestion was rejected, but shouldn't have, can you submit again? I'll approveEustache
I added dulwich_checkout.py to the dulwich porcelain repo these snippets on github.Impersonality
My first edit that uses ensure_dir_exists(...) to create folders was accepted, however, I tried to edit it again to use os.chmod(entry.in_path(repo.path).path,entry.mode) but that edit was rejectd. The os.chmod() is important if filemode=true in git config.Impersonality
One more thing, dulwich complains that get_blob(sha) or get_object(sha) are deprecated (dulwich-0.8.5) and to now use repo[sha] instead which works fine. Also Blob.data attribute works just as well as Blob.as_string().Impersonality
@MarkMikofski edited; can you attribute the origin somewhere on your project?Eustache
@MarkMikoski noticed that the feature request for repo checkout was fixed! now it's in the library, see the corresponding mergeEustache
E
3

It is now possible since release 0.8.4, with the method dulwich.index.build_index_from_tree().

It writes a tree to both the index file and the filesystem (working copy), which is a very basic form of checkout.

See the note

existing index is wiped and contents are not merged in a working dir. Suiteable only for fresh clones

I could get it work with the following code

from dulwich import index, repo
#get repository object of current directory
repo = repo.Repo('.')
indexfile = repo.index_path()
#we want to checkout HEAD
tree = repo["HEAD"].tree

index.build_index_from_tree(repo.path, indexfile, repo.object_store, tree)
Eustache answered 17/9, 2012 at 18:33 Comment(0)
H
1

In case you want to check out an existing branch from a remote repository, this is how i finally managed to do it:

from dulwich import porcelain
gitlab_server_address = 'gitlab.example.com/foo/my_remote_repo.git'
username = '[email protected]'
password = 'mocraboof'

repo = porcelain.clone(gitlab_server_address, target='myrepo', username=username, password=password)

# or if repo already exists: 
# repo = porcelain.open_repo('gholam')

branch_name = 'thebranch'
porcelain.branch_create(repo, branch_name)
porcelain.update_head(repo, target=branch_name, detached=False, new_branch=None)

porcelain.pull(repo, gitlab_server_address, refspecs=f'refs/heads/{branch_name}', username=username, password=password)

the problem was that when you clone a repository with dulwich, it will only fetch the main/master branch, and i couldn't find another way to fetch them. so i create the branch as new branch from main/master and then pull from remote.

(this might not work if your main branch is ahead of the initial commit that started your remote branch.)

Heterologous answered 10/11, 2021 at 7:23 Comment(0)
F
0
from dulwich.repo import Repo

repo = Repo.init('myrepo', mkdir=True)
f = open('myrepo/spam', 'w+')
f.write('my file content\n')
f.close()
repo.stage(['spam'])
repo.do_commit('initial commit', 'Flav <[email protected]>')

Found by looking at dulwich/tests/test_repository.py:371. dulwich is powerful but the docs are a bit lacking, unfortunately.

May also want to consider using GitFile instead.

Freddie answered 10/7, 2011 at 11:29 Comment(7)
-1 seeing the source, it does not checkout the commit; it is a wrapper of the OP's codeEustache
actually it works because you write the file to the working copy, so there's no need to check out, but it doesn't answer the OP question.Eustache
It's a wrapper of the OP's code that produces the end result he wants in less lines of code. It's not merely that it writes the file to the working directory; it uses it to perform the commit. This is the "correct" way to use dulwich to do what the OP is doing.Freddie
Sure, it's a nicer way to do a commit, but it doesn't say how to checkout, which I found to be an interesting problem.Eustache
This solution is ok for me too, as I'll have the data first, then I'll add it. It doesn't answer the question though, and I'm still curious how to checkout a branch.Liponis
@Flavius: to checkout branch replace repo["HEAD"].tree with repo.refs['refs/heads/yourbranch'].tree in my answerEustache
Is it possible that repo.do_commit() only allows committing to refs/heads/master? It looks like one needs to be able to set Commit.parents before committing, which can't be done with this method. So apparently the simplification made in this answer is quite limiting; it may be better to extend the original code.Bozuwa

© 2022 - 2024 — McMap. All rights reserved.