How do you write a CSV back to Azure Blob Storage using Databricks?
Asked Answered
D

2

5

I'm struggling to write back to an Azure Blob Storage Container. I'm able to read from a container using the following:

storage_account_name = "expstorage"
storage_account_key = "1VP89J..."
container = "source"

spark.conf.set("fs.azure.account.key.{0}.blob.core.windows.net".format(storage_account_name), storage_account_key)

dbutils.fs.ls("dbfs:/mnt/azurestorage")

I've tried multiple methods to write back to my container just doing a search, but I can't find a definitive way.

Here is a link to an alternative that uses a SAS key, but I didn't want to mix/match key types.

Write dataframe to blob using azure databricks

Disarrange answered 11/9, 2020 at 16:36 Comment(1)
what is {0} in your conf name ?Dvandva
I
8

In order to write to your Blob Storage, you just need to specify the path, starting with dbfs:/mnt/azurestorage :

df.write
 .mode("overwrite")
 .option("header", "true")
 .csv("dbfs:/mnt/azurestorage/filename.csv"))

This will create a folder with distributed data. If you are looking for a single csv file, try this instead :

df.toPandas().to_csv("dbfs:/mnt/azurestorage/filename.csv")

If you are using pandas only, you will not have access to the dbfs api, so you need to use the local files API instead, which means your path has to start with /dbfs/ instead of dbfs:/ as follows :

df.to_csv(r'/dbfs/mnt/azurestorage/filename.csv', index = False)
Interment answered 13/9, 2020 at 9:45 Comment(1)
please don't use com.databricks.spark.csv - Spark supports CSV natively since Spark 2.0...Leticia
S
0

Since mounting external storages is now considered a deprecated pattern, here's an alternative method that achieves similar results without the mounting part.

# This is for ADLS but works similarly to Blob
filesystem_url = f"abfss://{filesystem}@{storage_name}.dfs.core.windows.net"

# Files are written as distributed data
df.write.format("CSV")\
    .mode("overwrite")\
        .option("header", "true")\
            .save(f"{filesystem_url}/yourdata")
Seise answered 4/11, 2023 at 2:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.