How to write to delta table/delta format in Python without using Pyspark?
Asked Answered
H

2

14

I am looking for a way to write back to a delta table in python without using pyspark. I know there is a library called deltalake/delta-lake-reader that can be used to read delta tables and convert them to pandas dataframes.

The goal is to write back to the opened delta table

The input code looks like this:

from deltalake import DeltaTable
dt = DeltaTable('path/file')
df = dt.to_pandas()

So is there any way to get something like this to write from a pandas dataframe back to a delta table:

df = pandadf.to_delta()
DeltaTable.write(df, 'path/file')

Thank you for your assistance!

Haldes answered 1/10, 2021 at 14:9 Comment(1)
it's not yet possible if you look into features matrix: github.com/delta-io/delta-rs#featuresDivers
M
14

Now it is supported !!!, see this example

import duckdb 
from deltalake.writer import write_deltalake
df =duckdb.sql('''
LOAD 'httpfs';
SELECT countries_and_territories, sum(deaths) as total FROM 
read_parquet('https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet')
group by 1
order by total desc
limit 5;
''').df()
write_deltalake('Pathto/covid', df,mode='append')
Matchless answered 26/6, 2022 at 5:6 Comment(1)
Mim is correct! To provide additional context, they are using the delta-rs library which does not have a spark dependency. You can install delta-rs with pip or conda: $ pip install deltalake or $ conda install -c conda-forge delta-sparkHorick
L
1

@Mim is correct. This just provides more info.

Currently, you can use delta-rs to read and write to Delta Lake directly.

You can install by pip install deltalake or conda install -c conda-forge delta-spark.

import pandas as pd
from deltalake.writer import write_deltalake

df = pd.DataFrame({"x": [1, 2, 3]})
write_deltalake("path/to/delta-tables/table1", df)

Writing to S3

storage_options = {
    "AWS_DEFAULT_REGION": "us-west-2",
    "AWS_ACCESS_KEY_ID": "xxx",
    "AWS_SECRET_ACCESS_KEY": "xxx",
    "AWS_S3_ALLOW_UNSAFE_RENAME": "true",
}

write_deltalake(
    "s3a://my-bucket/delta-tables/table1",
    df,
    mode="append",
    storage_options=storage_options,
)

To remove AWS_S3_ALLOW_UNSAFE_RENAME and concurrently write, needs to set up DynamoDB lock.

Follow this GitHub ticket for more updates regarding how to set up correctly.

Leinster answered 12/5, 2023 at 17:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.