How do I install a library permanently in Colab?
Asked Answered
W

5

99

In Google Colaboratory, I can install a new library using !pip install package-name. But when I open the notebook again tomorrow, I need to re-install it every time.

Is there a way to install a library permanently? No need to spend time installing every time to use?

Wuhan answered 20/3, 2019 at 4:13 Comment(0)
W
19

If you want a no-authorization solution. You can use mounting with gcsfuse + service-account key embedded in your notebook. Like this:

# first install gcsfuse
%%capture
!echo "deb http://packages.cloud.google.com/apt gcsfuse-bionic main" > /etc/apt/sources.list.d/gcsfuse.list
!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
!apt update
!apt install gcsfuse

Then get your service account credential from google cloud console and embed it in the notebook

%%writefile /key.json
{
  "type": "service_account",
  "project_id": "kora-id",
  "private_key_id": "xxxxxxx",
  "private_key": "-----BEGIN PRIVATE KEY-----\nxxxxxxx==\n-----END PRIVATE KEY-----\n",
  "client_email": "[email protected]",
  "client_id": "100380920993833371482",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/colab-7%40kora-id.iam.gserviceaccount.com"
}

Then set environment to look for this credential file

%env GOOGLE_APPLICATION_CREDENTIALS=/key.json

You must then create (or have it already) a gcs bucket. And mount it to a made-up directory.

!mkdir /content/my-bucket
!gcsfuse my-bucket /content/my-bucket

Then finally, install the library there. Like my above answer.

import sys
nb_path = '/content/my-bucket'
sys.path.insert(0, nb_path)
# Do this just once
!pip install --target=$nb_path jdc

You can now import jdc without !pip install it next time.

Wuhan answered 29/8, 2019 at 10:47 Comment(4)
It's a few steps to get the json credential. Read this cloud.google.com/iam/docs/…Wuhan
@KorakotChaovavanich, I successfully followed all the step. But, the folder my-bucket is not around in new runtime. May I know where I do wrong?Skyjack
You must create a new gcs bucket here.(don't use my-bucket, use your own name) console.cloud.google.com/storage/browserWuhan
Please add to the answer a link or a quick guide for how to create that json. Its definitely not trivialUnimpeachable
W
106

Yes. You can install the library in Google Drive. Then add the path to sys.path.

import os, sys
from google.colab import drive
drive.mount('/content/drive')
nb_path = '/content/notebooks'
os.symlink('/content/drive/My Drive/Colab Notebooks', nb_path)
sys.path.insert(0,nb_path)

Then you can install a library, for example, jdc, and specify the target.

!pip install --target=$nb_path jdc

Later, when you run the notebook again, you can skip the !pip install line. You can just import jdc and use it. Here's an example notebook.

https://colab.research.google.com/drive/1KpMDi9CjImudrzXsyTDAuRjtbahzIVjq

BTW, I really like jdc's %%add_to. It makes working with a big class much easier.

Wuhan answered 20/3, 2019 at 4:13 Comment(13)
Your idea is very interesting and useful. However I should make authorization every time. I would like to do this process only once. Do you know a solution for this also? If yes, I will ask it in a different post.Sammiesammons
@Sammiesammons I guess it's possible. Instead of GDrive mount, you need to use gcsfuse to mount Google Cloud Storage bucket using embeded credential in your notebook (using %%writefile). Though I never try it.Wuhan
who is jdc and what does %%add_to cell magic do?Oversight
@Oversight jdc is a library. It allows %%add_to that add a new method to an existing class. See the notebook I linked.Wuhan
It might be better to just store the wheel on GDrive and PIP install from there. For a 1.5GB package, the difference in performance is huge.Cynar
@MaosiChen Hi, I just find that this method does not work for Pytorch, is there any point of view about this?Barger
If I want to install another package later in the same notebook, how to do that? Do we need to change the path?Swane
Doesn't work when I try to use the module form the imported dependency.Cortese
Hi, I am getting all the time the error No such file or directory: '/content/notebooks/...' has anyone any idea on how to solve it?Tull
@Tull I have not yet tried to implement this but as I understand from the following link ayoolafelix.hashnode.dev/… one has to add to the python search path the location of where the file was imported to before so that import command can find it. I imagine that is what the sys.path.insert(0, nb_path) in the answers by korakot and Tomek are for.Hypogene
@Cortese maybe my comment above might helpHypogene
@Hypogene I have figured out the way to install library permanently in google colab and written blog post on it, check here medium.com/@netraneupane/…Cortese
may I ask why the symlink necessary here? Can we directly let nb_path = '/content/drive/My Drive/Colab Notebooks' ?Imputation
W
19

If you want a no-authorization solution. You can use mounting with gcsfuse + service-account key embedded in your notebook. Like this:

# first install gcsfuse
%%capture
!echo "deb http://packages.cloud.google.com/apt gcsfuse-bionic main" > /etc/apt/sources.list.d/gcsfuse.list
!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
!apt update
!apt install gcsfuse

Then get your service account credential from google cloud console and embed it in the notebook

%%writefile /key.json
{
  "type": "service_account",
  "project_id": "kora-id",
  "private_key_id": "xxxxxxx",
  "private_key": "-----BEGIN PRIVATE KEY-----\nxxxxxxx==\n-----END PRIVATE KEY-----\n",
  "client_email": "[email protected]",
  "client_id": "100380920993833371482",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/colab-7%40kora-id.iam.gserviceaccount.com"
}

Then set environment to look for this credential file

%env GOOGLE_APPLICATION_CREDENTIALS=/key.json

You must then create (or have it already) a gcs bucket. And mount it to a made-up directory.

!mkdir /content/my-bucket
!gcsfuse my-bucket /content/my-bucket

Then finally, install the library there. Like my above answer.

import sys
nb_path = '/content/my-bucket'
sys.path.insert(0, nb_path)
# Do this just once
!pip install --target=$nb_path jdc

You can now import jdc without !pip install it next time.

Wuhan answered 29/8, 2019 at 10:47 Comment(4)
It's a few steps to get the json credential. Read this cloud.google.com/iam/docs/…Wuhan
@KorakotChaovavanich, I successfully followed all the step. But, the folder my-bucket is not around in new runtime. May I know where I do wrong?Skyjack
You must create a new gcs bucket here.(don't use my-bucket, use your own name) console.cloud.google.com/storage/browserWuhan
Please add to the answer a link or a quick guide for how to create that json. Its definitely not trivialUnimpeachable
C
3

You can install the libraries in Google Drive.

Install virtualenv:

!pip install virtualenv

Mount Google Drive:

from google.colab import drive
drive.mount("/content/drive")

Create a New Virtual Environment:

!virtualenv /content/drive/MyDrive/vir_env

Activate Virtual Environment and Install required libraries:

!source /content/drive/MyDrive/vir_env/bin/activate; pip install numpy

Adding the Virtual Environment to sys.path:

import sys
sys.path.append("/content/drive/MyDrive/vir_env/lib/python3.10/site-packages")

(after installing the libraries to google drive you can just run the above code and use those libraries without installing them again.)

Commissure answered 3/10, 2023 at 13:9 Comment(0)
C
2

I have installed libraries permanently in google colab using a virtual environment. Use this blog as reference https://netraneupane.medium.com/how-to-install-libraries-permanently-in-google-colab-fb15a585d8a5

Cortese answered 10/6, 2023 at 13:24 Comment(0)
C
1

In case you need to install multiple libraries here is a snippet:

def install_library_to_drive(libraries_list):
  """ Install library on gdrive. Run this only once. """
  drive_path_root = 'path/to/mounted/drive/directory/where/you/will/install/libraries'
  for lib in libraries_list:
    drive_path_lib = drive_path_root + lib
    !pip install -q $lib --target=$drive_path_lib
    sys.path.insert(0, drive_path_lib)

def load_library_from_drive(libraries_list):
""" Technically, it just appends install dir to a sys.path """
  drive_path_root = 'path/to/mounted/drive/directory/where/you/will/install/libraries'
  for lib in libraries_list:
    drive_path_lib = drive_path_root + lib
    sys.path.insert(0, drive_path_lib)

libraries_list = ["torch", "jsonlines", "transformers"] # list your libraries
install_library_to_drive(libraries_list) # Run this just once
load_library_from_drive(libraries_list)
Crystacrystal answered 10/8, 2022 at 8:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.