Actually I want to install a library on my Azure databricks cluster but I cannot use the UI method because every time my cluster would change and in transition I cannot add library to it using UI. Is there any databricks utility command for doing this?
@CHEEKATLAPRADEEP-MSFT's answer is awesome! Just a complement:
If you want all your notebooks / clusters to have the same libs installed, you can take advantage of cluster-scoped or global (new feature) init scripts.
The example below retrieves packages from PyPi:
#!/bin/sh
# Install dependencies
pip install --upgrade boto3 psycopg2-binary requests simple-salesforce
You can even use a private package index - for example AWS CodeArtifact:
#Install AWS CLI
pip install --upgrade awscli
# Configure pip
aws codeartifact login --region <REGION> --tool pip --domain <DOMAIN> --domain-owner <AWS_ACCOUNT_ID> --repository <REPO>
pip config set global.extra-index-url https://pypi.org/simple
Note: the cluster instance profile must be allowed to get CodeArtifact credentials (arn:aws:iam::aws:policy/AWSCodeArtifactReadOnlyAccess
).
Cheers
You can use %pip install command to install the required libraries from within your notebook code. This documentation provides further detail on its usage: https://docs.databricks.com/libraries/notebooks-python-libraries.html. For example:
!pip install requests
For older runtimes there was dbutils.library utility (https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-library) but it was deprecated.
You need to run a simple command in a separate shell. Do not write anything apart from pip install like this.
pip install nltk
© 2022 - 2024 — McMap. All rights reserved.