Write Python DataFrame as CSV into Azure Blob
Asked Answered
C

5

18

I have got two questions on reading and writing Python objects from/to Azure blob storage.

  1. Can someone tell me how to write Python dataframe as csv file directly into Azure Blob without storing it locally?

    I tried using the functions create_blob_from_text and create_blob_from_stream but none of them works.

    Converting dataframe to string and using create_blob_from_text function writes the file into the blob but as a plain string but not as csv.

    df_b = df.to_string()
    block_blob_service.create_blob_from_text('test', 'OutFilePy.csv', df_b)  
    
  2. How to directly read a json file in Azure blob storage directly into Python?

Carnegie answered 25/4, 2018 at 5:47 Comment(1)
See this to send the data as csv to blob : #50923555Eastereasterday
B
22
  1. Can someone tell me how to write Python dataframe as csv file directly into Azure Blob without storing it locally?

You could use pandas.DataFrame.to_csv method.

Sample code:

from azure.storage.blob import (
    BlockBlobService
)
import pandas as pd
import io

output = io.StringIO()
head = ["col1" , "col2" , "col3"]
l = [[1 , 2 , 3],[4,5,6] , [8 , 7 , 9]]
df = pd.DataFrame (l , columns = head)
print(df)
output = df.to_csv (index_label="idx", encoding = "utf-8")
print(output)

accountName = "***"
accountKey = "***"
containerName = "test1"
blobName = "test3.json"

blobService = BlockBlobService(account_name=accountName, account_key=accountKey)

blobService.create_blob_from_text('test1', 'OutFilePy.csv', output)

Output result:

enter image description here

2.How to directly read a json file in Azure blob storage directly into Python?

Sample code:

from azure.storage.blob import (
    BlockBlobService
)

accountName = "***"
accountKey = "***"
containerName = "test1"
blobName = "test3.json"

blobService = BlockBlobService(account_name=accountName, account_key=accountKey)

result = blobService.get_blob_to_text(containerName,blobName)

print(result.content)

Output result:

enter image description here

Hope it helps you.

Batsheva answered 25/4, 2018 at 8:47 Comment(2)
When I store the df.to_csv in a variable, it stores it in a local directory and the variable is of None type. Am I missing something?Guaiacol
If you would like to save the output to a subfolder then make this change: blobService.create_blob_from_text('test1', 'folder1/folder2/OutFilePy.csv', output)Effete
J
10

The approved answer did not work for me, as it depends on the azure-storage (deprecated/legacy as of 2021) package. I changed it as follows:

from azure.storage.blob import *
import dotenv
import io
import pandas as pd

dotenv.load_dotenv()
blob_block = ContainerClient.from_connection_string(
    conn_str=os.environ["CONNECTION_STRING"],
    container_name=os.environ["CONTAINER_NAME"]
    )
output = io.StringIO()
partial = df.DataFrame()
output = partial.to_csv(encoding='utf-8')
blob_block.upload_blob(name, output, overwrite=True, encoding='utf-8')
Jeer answered 9/8, 2021 at 19:23 Comment(1)
More info here: github.com/Azure/azure-sdk-for-python/tree/main/sdk/storage/…Matias
H
2

There was update in BlobServiceClient. create_blob_from_text method is no longer supported. Now you can use get_blob_client to get or create the blob file. Blob need not exist:

output = dataframe.to_csv(index_label="idx", encoding="utf-8")

blob_service = BlobServiceClient.from_connection_string(
   f"DefaultEndpointsProtocol=https;AccountName={ACCOUNT_NAME};AccountKey= 
{ACCOUNT_KEY};EndpointSuffix=core.windows.net"
)

container_client = blob_service.get_container_client(DEST_CONTAINER)
blob_client = blob_service.get_blob_client(container=DEST_CONTAINER, 
blob="kcScenarioTest/"+str(current_time.microsecond)+".csv") 

blob_client.upload_blob(output,overwrite=True,content_settings=ContentSettings(content_type="text/csv"))
Hannis answered 25/7, 2022 at 20:18 Comment(0)
G
2

Here's an example of writing a Python DataFrame into Azure Blob Storage without storing it locally. It doesn't require String.IO and uses the ContainerClient instead of BlockBlobService.


import pandas as pd

def write_csv(env, df_path, df):
    container_client = ContainerClient(
        env['container_url'],
        container_name=env['container_name'],
        credential=env['container_cred']
    )

    output = df.to_csv (index_label="idx", encoding = "utf-8")
    print(output)
    blob_client = container_client.get_blob_client(df_path)
    blob_client.upload_blob(output, overwrite=True)

    return 'success'
Garlic answered 13/1, 2023 at 18:38 Comment(0)
D
0

So you need a BytesIO file to upload to the blob, using the upload_blob method from azure.storage.blob module. You will also need to create a cotainer_client from the same module

blob_report_name = 'OutFilePy.csv'
stream_file = BytesIO()
df_b.to_csv(stream_file)  
file_to_blob = stream_file.getvalue()
blob_client = container_client.get_blob_client(blob_report_name)
blob_client.upload_blob(data=file_to_blob, overwrite=True)
Dozier answered 20/2, 2023 at 14:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.