delta-lake Questions

8

Solved

What is the correct way to install the delta module in python?? In the example they import the module from delta.tables import * but i did not find the correct way to install the module in my v...
Querist asked 17/12, 2019 at 11:37

2

Solved

Delta table delete operation is given here for Python and SQL, and truncate using SQL is given here. But I cannot find the documentation for Python truncate table. How to do it for delta table in D...
Muriel asked 13/5, 2021 at 10:58

1

Solved

I encountered an issue while trying to store JSON data as a Delta Lake table using PySpark and Delta Lake. Here's my code: from pyspark.sql import SparkSession from pyspark.sql.types import StructT...

6

I am currently employed as a Junior Data Developer and recently saw a post saying that Azure Synapse can now create SQL tables from Delta tables. I tried creating an SQL table from a Delta table wh...
Mckeever asked 26/2, 2021 at 13:12

4

Solved

I run a query on Databricks: DROP TABLE IF EXISTS dublicates_hotels; CREATE TABLE IF NOT EXISTS dublicates_hotels ... I'm trying to understand why I receive the following error: Error in SQL stat...
Nkrumah asked 13/10, 2021 at 7:51

3

Solved

I need to delete certain data from a delta-lake table before I load it. I am able to delete the data from delta table if it exists but it fails when the table does not exist. Databricks scala code ...
Hamo asked 6/10, 2020 at 16:39

5

I want to delete data from a delta file in databricks. Im using these commands Ex: PR=spark.read.format('delta').options(header=True).load('/mnt/landing/Base_Tables/EventHistory/') PR.write.format(...
Dissatisfaction asked 7/12, 2020 at 10:3

2

Solved

I am new to Databricks. I am reading Microsoft documentation on data lakehouse. In the documentation they make reference to delta lake without explaining what the difference is or even if the...
Paraclete asked 25/8, 2023 at 10:55

6

there is a function to delete data from a Delta Table: deltaTable = DeltaTable.forPath(spark, "/data/events/") deltaTable.delete(col("date") < "2017-01-01") But is there also a way to drop du...
Cytochrome asked 8/5, 2020 at 7:48

2

I have a delta table with millions of rows and several columns of various types, incl. nested Structs. And I want to create an empty DataFrame clone of the delta table, in the runtime - i.e. same s...
Toe asked 9/6, 2022 at 8:58

5

I get an error when I execute the following line of code: deltaTarget.alias('target').merge(df.alias('source'), mergeStatement).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute() The error...
Odo asked 17/11, 2020 at 21:54

3

Solved

So I have a dataframe which has a column, file_date. For a given run, the dataframe has only data for one unique file_date. For instance, in a run, let us assume that there are say about 100 record...
Discontent asked 22/1, 2020 at 0:41

1

How exactly can on create an efficient and reusable Databricks workflow for dumping raw SQL database into the Delta Lake. Some confusion here is for the best approach to the following: Handling dr...

3

Solved

I am new to spark and delta-lake and trying to do one POC with pyspark and using minio as delta-lake's storage backend. However, I am getting error that Class org.apache.hadoop.fs.s3a.S3AFileSyste...
Remsen asked 16/2, 2023 at 12:24

5

Solved

I am new to databricks and delta live tables. I have problem with creating delta live table in python. How to create delta live table from json files in filestore?
Jokjakarta asked 15/5, 2022 at 15:22

3

Solved

I'd like connect to Delta using JDBC and would like to run the Spark Thrift Server (STS) in local mode to kick the tyres. I start STS using the following command: $SPARK_HOME/sbin/start-thriftserve...

3

Solved

I have a pyspark dataframe currently from which I initially created a delta table using below code - df.write.format("delta").saveAsTable("events") Now, since the above datafra...
Greaves asked 23/2, 2021 at 20:38

1

Background "zstd" compression codec has 22 compression levels. I read this Uber blog. Regarding compressing time and file size, I verified using df.to_parquet with our data and got same e...
Cymophane asked 29/9, 2023 at 19:12

5

Solved

I need to read dataset into a DataFrame, then write the data to Delta Lake. But I have the following exception : AnalysisException: 'Incompatible format detected.\n\nYou are trying to write to `d...
Reconnaissance asked 16/7, 2019 at 8:21

8

Solved

How can I drop a Delta Table in Databricks? I can't find any information in the docs... maybe the only solution is to delete the files inside the folder 'delta' with the magic command or dbutils: %...
Loreleilorelie asked 22/11, 2019 at 9:1

2

Solved

I know you can set spark.sql.shuffle.partitions and spark.sql.adaptive.advisoryPartitionSizeInBytes. The former will not work with adaptive query execution, and the latter only works for the first ...

2

Solved

Curious as to how indexing works in Databricks. Can you see the partitioning as indexing because it effectively organizes the data in grouped subcategories?
Mimicry asked 3/5, 2021 at 22:29

4

I have been exploring the data lakehouse concept and Delta Lake. Some of its features seem really interesting. Right there on the project home page https://delta.io/ there is a diagram showing Delt...
Muirhead asked 20/4, 2021 at 15:25

2

I am looking for a way to write back to a delta table in python without using pyspark. I know there is a library called deltalake/delta-lake-reader that can be used to read delta tables and convert...
Haldes asked 1/10, 2021 at 14:9

4

Can one use Delta Lake and not being dependent on Databricks Runtime? (I mean, is it possible to use delta-lake with hdfs and spark on prem only?) If no, could you elaborate why is that so from tec...
Marko asked 23/3, 2020 at 16:5

© 2022 - 2024 — McMap. All rights reserved.