Methods for using Git with Google Colab
Asked Answered
I

16

144

Are there any recommended methods to integrate git with colab?

For example, is it possible to work off code from google source repositories or the likes?

Neither google drive nor cloud storage can be used for git functionality.

So I was wondering if there is a way to still do it?

Infatuated answered 19/1, 2018 at 22:6 Comment(0)
C
122

If you want to clone a private repository, the quickest way would be to create a personal access token and select only privileges that your application needs. Then clone command for GitHub would look like:

!git clone https://[email protected]/username/repository.git
Centro answered 20/7, 2018 at 11:55 Comment(8)
That's close to repeating another answer. But welcome.Peyter
is there a way to dynamically pass username and password. using variables ?Shackle
u = 'user0'; p = 'pass0'; !git clone https://$u:[email protected]/$u/repository.git This should work, {p} {u} also work. (!... should start a new line though.)Phototypography
If you have @ in your password, replace it with %40.Shewchuk
Note that the personal access token gives FULL access to private repos. Github doesn't allow you to restrict the token only to read-access.Calcutta
The safest way I could find is to use tokens as defined above and take it as variable, such that it's traces are gone after the usage. from IPython.display import clear_output; u = input(); key = input(); clear_output() run !git clone https://$u:[email protected]/rohitdavas/reponame.gitWallenstein
The first solution suggested in this answer does not work anymore as of 13. August 2021. You now have to use personal acces tokens.Sheelah
What if I want to pip install?Singlehandedly
P
69

git is installed on the machine, and you can use ! to invoke shell commands.

For example, to clone a git repository:

!git clone https://github.com/fastai/courses.git

Here's a complete example that clones a repository and loads an Excel file stored therein. https://colab.research.google.com/notebook#fileId=1v-yZk-W4YXOxLTLi7bekDw2ZWZXWW216

Phonic answered 21/1, 2018 at 19:3 Comment(4)
Is there a way to enter the username and password when cloning private repos ?Bilbrey
How about getting back the results from Colab into github. I am thinking mostly for checkpoint files to to retrieve the models for inference on a local machineVaulting
Create a token in the Dev tab in your github settings and use this: ! git clone https://[email protected]/username/repository.git.Yseulte
@AbhaiKollara you can do u = 'user'; p = 'pass'; !git clone https://$u:[email protected]/$u/repository.gitArris
B
49

The very simple and easy way to clone your private github repo in Google colab is as below.

  1. Your password won't be exposed
  2. Though your password contains special character also it works
  3. Just run the below snippet in Colab cell and it will execute in an interactive way
import os
from getpass import getpass
import urllib

user = input('User name: ')
password = getpass('Password: ')
password = urllib.parse.quote(password) # your password is converted into url format
repo_name = input('Repo name: ')

cmd_string = 'git clone https://{0}:{1}@github.com/{0}/{2}.git'.format(user, password, repo_name)

os.system(cmd_string)
cmd_string, password = "", "" # removing the password from the variable
Brussels answered 17/8, 2019 at 19:7 Comment(4)
you say password won't be exposed but if one has access to the bash statements executed by colab notebook, they can easily find the password, isn't it? and if you assume that no-one has the access to such bash statements, then how exactly other methods exposes the password?Jenette
@Jenette the passwords won't be saved on the notebookSaad
@Saad this assumes no one is reading the log files...Cretan
GitHub removed password authentication on August 13, 2021. This is the error which you will get if you use this: "remote: Support for password authentication was removed on August 13, 2021. Please use a personal access token instead." Please see github.blog/… for more informationDichromic
D
19

You can use ssh protocol to connect your private repository with colab

  1. Generate ssh key pairs on your local machine, don't forget to keep
    the paraphrase empty, check this tutorial.

  2. Upload it to colab, check the following screenshot

    from google.colab import files
    uploaded = files.upload()

  3. Move the ssh kay pairs to /root and connect to git

    • remove previously ssh files
      ! rm -rf /root/.ssh/*
      ! mkdir /root/.ssh
    • uncompress your ssh files
      ! tar -xvzf ssh.tar.gz
    • copy it to root
      ! cp ssh/* /root/.ssh && rm -rf ssh && rm -rf ssh.tar.gz ! chmod 700 /root/.ssh
    • add your git server e.g gitlab as a ssh known host
      ! ssh-keyscan gitlab.com >> /root/.ssh/known_hosts
      ! chmod 644 /root/.ssh/known_hosts
    • set your git account
      ! git config --global user.email "email"
      ! git config --global user.name "username"
    • finally connect to your git server
      ! ssh [email protected]
  4. Authenticate your private repository, please check this Per-repository deploy keys.

  5. Use ! [email protected]:{account}/{projectName}.git
    note: to use push, you have to give write access for
    the public ssh key that you authenticate git server with.

Delagarza answered 20/4, 2018 at 3:39 Comment(2)
See also: medium.com/@ashkanpakzad/data-into-google-colab-5ddeb4f4e8Presumably
The "per-repository deploy keys" is dead. It would be great if instructions could be added to this answer so that this answer does not depend on external linksAnastigmat
A
18

In order to protect your account username and password, you can use getPass and concatenate them in the shell command:

from getpass import getpass
import os

user = getpass('BitBucket user')
password = getpass('BitBucket password')
os.environ['BITBUCKET_AUTH'] = user + ':' + password

!git clone https://[email protected]/{user}/repository.git
Arrear answered 1/11, 2018 at 1:45 Comment(3)
This protection is really weak, in the sense that the plain text password is displayed in the log/output.Keturahkeung
@Keturahkeung It only shows the password if you are not able to clone the repository because you get some error.Skimmer
In my case, I was getting a 400 (bad request) because of {user}, which does not translate to the actual user name.Skimmer
I
13

Update September 2021 — For security reasons, passwords are now deprecated for github usage. Please use the Personal Access Token instead — Go to github.com -> Settings ->Developer Settings -> Personal Access Token and generate a token for the required purpose. Use this in place of your password for all tasks mentioned along this tutorial!

For more details you can also see my article on Medium : https://medium.com/geekculture/using-git-github-on-google-colaboratory-7ef3b76fe61b

None of the answers provide a straight and direct answer like this one :

GitColab

Probably this is the answer you are looking for..

Works on colab for both public and private repositories and don't change/skip any step: (Replace all {vars})

TL;DR Complete Process:

!git clone https://{your_username}:{your_password}@github.com/{destination_repo_username}/{destination_repo_projectname}.git
%cd /content/{destination_repo_username}

!git config --global user.name "{your_username}"
!git config --global user.email "{your_email_id}"
!git config --global user.password "{your_password}"

Make Your Changes and then run :

!git add .
!git commit -m "{Message}"
!git push

Cloning a Repository :

!git clone https://{your_username}:{your_password}@github.com/{destination_repo_username}/{destination_repo_projectname}.git

Change the directory to

Change the directory to {destination_repo_username} using line magic command %cd for jupyter notebooks.

%cd /content/{destination_repo_username}

Verify!

Pull

Sanity Check to see if everything works perfectly!

!git pull

If no changes were made to the remote git repo after cloning, the following should be the displayed output :

Already up to date.

Status

Similarly check the status of the staged/unstaged changes.

!git status

It should display this, with the default branch selected :

On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

Check Older Logs

Check the previous commits you have made on the repo :

!git log -n 4

Outputs Git Commit IDs with Logs :

commit 18ccf27c8b2d92b560e6eeab2629ba0c6ea422a5 (HEAD -> main, origin/main, origin/HEAD)
Author: Farhan Hai Khan <[email protected]>
Date:   Mon May 31 00:12:14 2021 +0530

    Create README.md

commit bd6ee6d4347eca0e3676e88824c8e1118cfbff6b
Author: khanfarhan10 <[email protected]>
Date:   Sun May 30 18:40:16 2021 +0000

    Add Zip COVID

commit 8a3a12863a866c9d388cbc041a26d49aedfa4245
Author: khanfarhan10 <[email protected]>
Date:   Sun May 30 18:03:46 2021 +0000

    Add COVID Data

commit 6a16dc7584ba0d800eede70a217d534a24614cad
Author: khanfarhan10 <[email protected]>
Date:   Sun May 30 16:04:20 2021 +0000

    Removed sample_data using colab (testing)

Make changes in the local repo

Make changes from the local repo directory.

These might include, edditions, deletions, edits.

Pro Tip : If you want you can copy paste things from drive to a git repo by:

Mount Google Drive:

from google.colab import drive
drive.mount('/content/gdrive')

Copy contents using shutil :

import shutil

# For a folder:
shutil.copytree(src_folder,des_folder)

# For a file:
shutil.copy(src_file,des_file)

# Create a ZipFile
shutil.make_archive(archive_name, 'zip', directory_to_zip)

Set Git Credentials

Tell Git Who You Are?

!git config --global user.name "{your_username}"
!git config --global user.email "{your_email_id}"
!git config --global user.password "{your_password}"

Check Remote Again

Check if the remote url is set and configured correctly :

!git remote -v

If configured properly it should output the following :

origin  https://{your_username}:{your_password}@github.com/{destination_repo_username}/{destination_repo_projectname}.git (fetch)
origin  https://{your_username}:{your_password}@github.com/{destination_repo_username}/{destination_repo_projectname}.git (push)

Add, Commit, Push

You know what to do.

!git add .
!git commit -m "{Message}"
!git push

Enjoy!

Injector answered 30/5, 2021 at 21:13 Comment(0)
N
10

You can almost use this link: https://qiita.com/Rowing0914/items/51a770925653c7c528f9

as a summary of the above link you should do this steps:

1- connect your google colab runtime to your Google Drive using this commands:

from google.colab import drive
drive.mount('/content/drive')

It would need a authentication process. Do whatever it needs.

2- Set current directory the path you want to clone the Git project there:

in my example:

path_clone = "drive/My Drive/projects"
%cd path_clone

don't forget to use ! in the beginning of cd command.

3- Clone the Git project:

!git clone <Git project URL address>

now you would have the cloned Git project in projects folder in you Google Drive (which is also connected to your Google Colab runtime machine)

4- Go to your Google Drive (using browser or etc) and then go to the "projects" folder and open the .ipynb file that you want to use in Google Colab.

5- Now you have Google Colab runtime with the .ipynb that you wanted to use which is also connected to your Google Drive and all cloned git files are in the Colab runtime's storage.

Note:

1- Check that your Colab runtime is connected to Google Drive. If it's not connected, just repeat the step #1 above.

2- Double check by using "pwd" and "cd" commands that the current directory is related to the cloned git project in google Drive (step #2 above).

Nailhead answered 24/2, 2019 at 17:28 Comment(0)
P
9

Three steps to use git to sync colab with github or gitlab.

  1. Generate a private-public key pair. Copy the private key to the system clibboard for use in step 2. Paste the public key to github or gitlab as appropriate.

    In Linux, ssh-keygen can be used to generate the key-pair in ~/.ssh. The resultant private key is in the file id_rsa, the public key is in the file id_rsa.pub.

  2. In Colab, execute

    key = \
    '''
    paste the private key here 
    (your id_rsa or id_ecdsa file in the .ssh directory, e.g.
    -----BEGIN EC PRIVATE KEY-----
    M..............................................................9
    ...............................................................J
    ..................................==
    -----END EC PRIVATE KEY-----
    '''
    ! mkdir -p /root/.ssh
    with open(r'/root/.ssh/id_rsa', 'w', encoding='utf8') as fh:
        fh.write(key)
    ! chmod 600 /root/.ssh/id_rsa
    ! ssh-keyscan github.com >> /root/.ssh/known_hosts 
    # test setup
    ! ssh -T [email protected]
    # if you see something like "Hi ffreemt! You've successfully 
    # authenticated, but GitHub does not provide shell access."
    # you are all set. You can tweak .ssh/config for multiple github accounts
    
  3. Use git to pull/push as usual.

The same idea can be used for rsync (or ssh) bewteen colab and HostA with minor changes:

  1. Generate a private-public key pair. Copy the private key to the system clibboard for use in step 2. Paste the public key to authorized_keys in .ssh in HostA.

  2. In Colab, execute

    key = \
    '''
    paste the private key here
    '''
    ! mkdir -p /root/.ssh
    with open(r'/root/.ssh/id_rsa', 'w', encoding='utf8') as fh:
        fh.write(key)
    ! chmod 600 /root/.ssh/id_rsa
    ! ssh -oStrictHostKeyChecking=no root@HostA hostnam  # ssh-keyscan 
    

HostA >> /root/.ssh/known_hosts does not seem to work with IP.

  1. Use rsync to sync files bewtenn colab and HostA as usual.
Phototypography answered 1/9, 2018 at 17:32 Comment(0)
S
9

Cloning a private repo to google colab :

Generate a token:

Settings -> Developer settings -> Personal access tokens -> Generate new token

Copy the token and clone the repo (replace username and token accordingly)

!git clone https://username:[email protected]/username/repo_name.git
Sikh answered 28/8, 2020 at 8:41 Comment(0)
S
4

The solution https://mcmap.net/q/158721/-methods-for-using-git-with-google-colab did not work for me because the expression {user} was not being converted to the actual username (I was getting a 400 bad request), so I slightly changed that solution to the following one.

from getpass import getpass
import os

os.environ['USER'] = input('Enter the username of your Github account: ')
os.environ['PASSWORD'] = getpass('Enter the password of your Github account: ')
os.environ['REPOSITORY'] = input('Enter the name of the Github repository: ')
os.environ['GITHUB_AUTH'] = os.environ['USER'] + ':' + os.environ['PASSWORD']

!rm -rf $REPOSITORY # To remove the previous clone of the Github repository
!git clone https://[email protected]/$USER/$REPOSITORY.git

os.environ['USER'] = os.environ['PASSWORD'] = os.environ['REPOSITORY'] = os.environ['GITHUB_AUTH'] = ""

If you are able to clone your-repo, you should not see any password in the output of this command. If you get an error, the password could be displayed to the output, so make sure you do not share your notebook whenever this command fails.

Skimmer answered 15/10, 2019 at 13:29 Comment(0)
T
4

I tried some of the methods here and they all worked well, but an issue I faced was, it became difficult to handle all the git commands and other related commands, for example version control with DVC, within notebook cells. So, I turned to this nice solution, Kora. It is a terminal emulator that can be run with in colab. This gives the ease of usage very similar to a terminal in local machine. The notebook will be still alive and we can edit files and cells as usual. Since this console is temporary, no information is exposed. GitHub login and other commands can be run as usual.

Kora: https://pypi.org/project/kora/

Usage:

!pip install kora
from kora import console
console.start()
Turkestan answered 7/11, 2020 at 6:59 Comment(0)
P
3

I finally pulled myself together and wrote a python package for this.

pip install clmutils  # colab-misc-utils

Create a dotenv or .env in /content/drive/MyDrive (if google drive is mounted to drive) or /content/drive/.env with

# for git 
user_email = "your-email"
user_name = "your-github-name"
gh_key = "-----BEGIN EC PRIVATE KEY-----
...............................................................9
your github private key........................................J
..................................==
-----END EC PRIVATE KEY-----
"

In a Colab cell

from clmutils import setup_git, Settings

config = Settings()
setup_git(
    user_name=config.user_name,
    user_email=config.user_email,
    priv_key=config.gh_key
)

You are then all set to do all the git cloen, amend code, git push stuff as if it were on your own lovely computer at home or at work.

clmutils also has a funtion called setup_ssh_tunnel to setup a reverse ssh tunnel to Colab. It also reads various keys, username, hostname from the .env file. It's a bit involving. But if you know how to manually set up a revers ssh tunnel to Colab, you would have no problems figuring out what they are used for. Details are available on the github repo (google clmutils pypi).

Phototypography answered 1/1, 2021 at 12:11 Comment(0)
L
0

Mount the drive using:

from google.colab import drive
drive.mount('/content/drive/')

Then:

%cd /content/drive/

To clone the repo in your drive

!git clone <github repo url> 

Access other files from the repo(example: helper.py is another file in repo):

import imp 
helper = imp.new_module('helper')
exec(open("drive/path/to/helper.py").read(), helper.__dict__)
Lauzon answered 18/11, 2018 at 7:15 Comment(1)
fatal: could not create work tree dir 'example': Operation not supportedScrofula
M
0

This works if you want to share your repo and colab. Also works if you have multiple repos. Just throw it in a cell.

import ipywidgets as widgets
from IPython.display import display
import subprocess

class credentials_input():
    def __init__(self, repo_name):
        self.repo_name = repo_name
        self.username = widgets.Text(description='Username', value='')
        self.pwd = widgets.Password(description = 'Password', placeholder='password here')

        self.username.on_submit(self.handle_submit_username)
        self.pwd.on_submit(self.handle_submit_pwd)        
        display(self.username)

    def handle_submit_username(self, text):
        display(self.pwd)
        return

    def handle_submit_pwd(self, text):
        cmd = f'git clone https://{self.username.value}:{self.pwd.value}@{self.repo_name}'
        process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
        output, error = process.communicate()
        print(output, error)
        self.username.value, self.pwd.value = '', ''

get_creds = credentials_input('github.com/username/reponame.git')
get_creds
Macknair answered 26/8, 2019 at 13:47 Comment(0)
E
0

Another solution based on answer from @Marafon Thiago:

ATENTION: In password with special caracters use the respective encoding of caracter.

Ex passwd = '@123' you should type :passwd = '%40123'

from getpass import getpass
user = getpass('BitBucket user')
password = getpass('BitBucket password')

!git init
!git clone https://{user}:{password}@bitbucket.org/aqtechengenharia/aqtlibpy.git 

Expositor answered 23/8, 2021 at 14:50 Comment(0)
A
0

I've recently made a script to automate the steps to clone private repo on https://github.com/tsunrise/colab-github/

You can run the following in colab

!wget -q https://raw.githubusercontent.com/tsunrise/colab-github/main/colab_github.py
import colab_github
colab_github.github_auth(persistent_key=True)

And then clone your repo using SSH method:

!git clone [email protected]:<your_username>/<your_private_repo>.git
Alexaalexander answered 11/1, 2023 at 22:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.