Delta Lake independent of Apache Spark?
Asked Answered
M

4

8

I have been exploring the data lakehouse concept and Delta Lake. Some of its features seem really interesting. Right there on the project home page https://delta.io/ there is a diagram showing Delta Lake running on "your existing data lake" without any mention of Spark. Elsewhere it suggests that Delta Lake indeeds runs on top of Spark. So my question is, can it be run independently from Spark? Can I, for example, set up Delta Lake with S3 buckets for storage in Parquet format, schema validation etc, without using Spark in my architecture?

Muirhead answered 20/4, 2021 at 15:25 Comment(0)
A
8

You might keep an eye on this: https://github.com/delta-io/delta-rs

It's early and currently read-only, but worth watching as the project evolves.

Archlute answered 22/4, 2021 at 18:48 Comment(0)
F
0

Currently, you can use delta-rs to read and write to Delta Lake directly.

It support Rust and Python. Here is an example using Python:

You can install by pip install deltalake or conda install -c conda-forge delta-spark.

import pandas as pd
from deltalake.writer import write_deltalake

df = pd.DataFrame({"x": [1, 2, 3]})
write_deltalake("path/to/delta-tables/table1", df)

Writing to S3

storage_options = {
    "AWS_DEFAULT_REGION": "us-west-2",
    "AWS_ACCESS_KEY_ID": "xxx",
    "AWS_SECRET_ACCESS_KEY": "xxx",
    "AWS_S3_ALLOW_UNSAFE_RENAME": "true",
}

write_deltalake(
    "s3a://my-bucket/delta-tables/table1",
    df,
    mode="append",
    storage_options=storage_options,
)

To remove AWS_S3_ALLOW_UNSAFE_RENAME and concurrently write, it needs DynamoDB lock.

Follow this GitHub ticket for more updates regarding how to set up correctly.

Fieldwork answered 12/5, 2023 at 17:41 Comment(0)
Z
0

Yes, this is absolutely possible. We had built scalable data backend using this approach of Delta Lake, Glue data catalog, Amazon S3 and Amazon Athena. Amazon Athena can be used to query the data instead of Apache Spark.

Please refer to this blog that explains the same in detail.

Zootechnics answered 13/5, 2023 at 9:49 Comment(0)
E
-3

tl;dr No


Delta Lake up to and including 0.8.0 is tightly integrated with Apache Spark so it's impossible to have Delta Lake without Spark.

Exon answered 20/4, 2021 at 15:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.