I am working with Python and Pyspark, and I want to upload a CSV file to an azure blob storage. I have already a dataframe generated by code: df. What I want to do is the next:
# Dataframe generated by code
df
# Create the BlockBlockService that is used to call the Blob service for the storage account
block_blob_service = BlockBlobService(account_name='name', account_key='key')
container_name ='results-csv'
d = {'one' : pandas.Series([1., 2., 3.], index=['a', 'b', 'c']), 'two' : pandas.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pandas.DataFrame(d)
writer = pandas.ExcelWriter(df, engine='xlsxwriter')
a = df.to_excel(writer, sheet_name='Sheet1', index=False, engine='xlsxwriter')
block_blob_service.create_blob_from_stream(container_name, 'test', a)
I get the error:
ValueError: stream should not be None.
So I want to upload the content of the dataframe as a blob to the storage location provided above. Is there any way to do that without first generating a CSV file in my local computer?
BytesIO
, it is almost the same as save to a file. And then you can upload it as stream or bytes. – Ingvara = df.to_csv()
andblock_blob_service.create_blob_from_text(container_name, "test.csv", a)
– Ingvar