delta-lake Questions

2

Solved

I have two dataframes(from a delta lake table) that do a left join via an id column. sd1, sd2 %sql select a.columnA, b.columnB, from sd1 a left outer join sd2 b on a.id = b.id The problem is ...

2

Is it possible to implement a delta lake on-premise ? if yes, what softwares/tools needs to be installed? I'm trying to implement a delta lake on premise to analyze some log files and databas...
Salo asked 9/2, 2021 at 19:36

2

Solved

Where does Delta Lake store the table metadata info. I am using spark 2.6(Not Databricks) on my standalone machine. My assumption was that if I restart spark, the table created in delta lake spark ...
Tight asked 21/7, 2020 at 10:22

5

Is there a SQL command that I can easily use to change the datatype of a existing column in Delta table. I need to change the column datatype from BIGINT to STRING. Below is the SQL command I'm try...
Cantara asked 31/5, 2021 at 6:59

4

Solved

I'm trying to add a new column to data stored as a Delta Table in Azure Blob Storage. Most of the actions being done on the data are upserts, with many updates and few new inserts. My code to write...
Swayback asked 21/8, 2020 at 19:7

6

Solved

According to the article by Databricks, it is possible to integrate delta lake with AWS Glue. However, I am not sure if it is possible to do it also outside of Databricks platform. Has someone done...
Availability asked 2/10, 2019 at 6:0

3

Solved

I have a table in Databricks delta which is partitioned by transaction_date. I want to change the partition column to view_date. I tried to drop the table and then create it with a new partition co...
Deepsea asked 4/3, 2019 at 18:12

1

I am trying to create a Delta table from SELECT statement and a NULL value for column. There is no error coming when I create the table but it is throwing a error when trying to run select. %sql cr...
Flyover asked 24/11, 2022 at 12:32

2

I am working with Databricks Delta Live Tables, but have some problems with upserting some tables upstream. I know it is quite a long text below, but I tried to describe my problem as clear as poss...

1

Solved

Is there any way to know which column was used for z-ordering for a given table? I've tried multiple commands like describe and describe extended along with viewing the delta log. I found no inform...
Garrard asked 10/10, 2022 at 11:2

2

I had a small confusion on transactional log of Delta lake. In the documentation it is mentioned that by default retention policy is 30 days and can be modified by property -: delta.logRetentionDur...
Longsighted asked 29/12, 2020 at 3:14

2

Solved

I would like to know what is the best way to load a delta table specific partition ? Is option 2 loading the all table before filtering ? option 1 : df = spark.read.format("delta").option...
Communication asked 12/7, 2021 at 8:37

1

Solved

I'm using the Databricks. For my data I created a DeltaLake. Then I tried to modify the column using pandas API but for some reason the following error message pops up: ValueError: Cannot combine t...
Quagga asked 13/9, 2022 at 9:3

0

I have two pyspark streaming jobs: streaming_job_a reads from kafka, writes a dataframe containing the raw data in one column and a timestamp in another column to location A in s3, and creates unm...

2

Solved

What are the major differences between S3 lake formation governed tables and databricks delta tables? they look pretty similar.
Carreon asked 6/12, 2021 at 12:1

7

Solved

I have recently started discovering Databricks and faced a situation where I need to drop a certain column of a delta table. When I worked with PostgreSQL it was as easy as ALTER TABLE main.metri...
Realist asked 31/1, 2019 at 9:15

2

Solved

I am working in AWS Glue environment. I read the data from Glue catalog as a Dynamic dataframe and convert it to Pyspark dataframe for my custom transformations. To do an upsert of the new/updated ...
Bard asked 30/8, 2021 at 8:12

2

Solved

All the examples in the Databricks documentation are in Scala. Can't find how to use this trigger type from PySpark. Is there an equivalent API or workaround ?
Consignee asked 10/2, 2022 at 8:20

3

Solved

How do you rename a column in Databricks? The following does not work: ALTER TABLE mySchema.myTable change COLUMN old_name new_name int It returns the error: ALTER TABLE CHANGE COLUMN is not...
Swop asked 26/12, 2019 at 17:6

0

I have a parquet file that I am trying to write to a delta table. My code is straight forward I think. from pyspark.sql import SparkSession spark = SparkSession.builder \ .master("local[*]&qu...
Cicatrize asked 9/6, 2022 at 19:44

1

Solved

DELTA does not have CREATE TABLE LIKE. It does have CTAS. I want to copy the definition of a table only, but also specify the LOCATION. E.g. this does not work: CREATE TABLE IF NOT EXISTS NEW_CUSTO...
Fouts asked 29/3, 2022 at 14:10

3

I am importing fact and dimension tables from SQL Server to Azure Data Lake Gen 2. Should I save the data as "Parquet" or "Delta" if I am going to wrangle the tables to create a...
Fustanella asked 16/12, 2020 at 9:55

3

Solved

# Creating PySpark Object from pyspark.sql import SparkSession spark = SparkSession.builder.appName("XMLParser").getOrCreate() sc=spark.sparkContext hadoop_conf=sc._jsc.hadoopConfiguration() hadoop...
Noncompliance asked 13/5, 2020 at 8:16

1

I have a simple Spark job that streams data to a Delta table. The table is pretty small and is not partitioned. A lot of small parquet files are created. As recommended in the documentation (https:...
Coastguardsman asked 12/8, 2021 at 13:22

2

I have a problem regarding merging csv files using pysparkSQL with delta table. I managed to create upsert function that update if matched and insert if not matched. I want to add column ID to th...
Cashmere asked 3/12, 2019 at 11:38

© 2022 - 2024 — McMap. All rights reserved.