Databricks and Azure Files
Asked Answered
F

2

6

I need to access Azure Files from Azure Databricks. According to the documentation Azure Blobs are supported but I am need this code to work with Azure files:

dbutils.fs.mount(
  source = "wasbs://<your-container-name>@<your-storage-account-name>.file.core.windows.net",
  mount_point = "/mnt/<mount-name>",
  extra_configs = {"<conf-key>":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})

or is there another way to mount/access Azure Files to/from a Azure Databricks cluster? Thanks

Fro answered 10/4, 2019 at 17:9 Comment(0)
M
5

On Azure, generally you can mount a file share of Azure Files to Linux via SMB protocol. And I tried to follow the offical tutorial Use Azure Files with Linux to do it via create a notebook in Python to do the commands as below, but failed.

enter image description here

It seems that Azure Databricks does not allow to do that, even I searched about mount NFS, SMB, Samba, etc. in Databricks community that there is not any discussion.

So the only way to access files in Azure Files is to install the azure-storage package and directly to use Azure Files SDK for Python on Azure Databricks.

Moncear answered 11/4, 2019 at 10:33 Comment(3)
1. You need to be root in order to mount 2. It's not practical to mount a share everytime a cluster starts 3. I guess I will need to switch to Azure blob in order to use Databricks. Thanks for your help!!Fro
@Fro Sure, you need to switch to Azure Blob. I tried to mount file share via sudo as root and got the same error. For #2, the offical tutorial I linked introduces how to persistently mount via /etc/fstab.Moncear
I would not attempt to mount any storage directly without using dbutils (which does not support Azure Files). If you try to mount via fstab that will mount the storage on the driver node only. Assuming you want to run a spark workload you need the workers to have access to the storage. Consider using Azure Data Factory to move the files to blob/lake instead.Contraction
I
1

Install Library: azure-storage-file-share https://pypi.org/project/azure-storage-file-share/

#Upload to Azure File Share

from azure.storage.fileshare import ShareFileClient
 
file_client = ShareFileClient.from_connection_string(conn_str="AZURE_STORAGE_CONNECTION_STRING", share_name="AZURE_STORAGE_FILE_SHARE_NAME", file_path="summary_uploaded.csv")
 
with open("/dbfs/tmp/summary_to_upload.csv", "rb") as source_file:
    file_client.upload_file(source_file)

#Download from Azure File Share

file_client = ShareFileClient.from_connection_string(conn_str="AZURE_STORAGE_CONNECTION_STRING", share_name="AZURE_STORAGE_FILE_SHARE_NAME", file_path="summary_to_download.csv")
 
with open("/dbfs/tmp/summary_downloaded.csv", "wb") as file_handle:
    data = file_client.download_file()
    data.readinto(file_handle)

Next steps:

  1. Define a new secret key in Azure Key Vault for holding the value for ‘conn_str’ (AZURE_STORAGE_CONNECTION_STRING). Key can be: az-storage-conn-string
  2. Define a new secret key in Azure Key Vault for holding the value for ‘share_name’ (AZURE_STORAGE_FILE_SHARE_NAME). Key: az-storage-file-share
  3. Read both of these keys from key vault and avoid hard-coding.
Intercession answered 17/9, 2020 at 5:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.