Databricks drop a delta table?
Asked Answered
L

8

32

How can I drop a Delta Table in Databricks? I can't find any information in the docs... maybe the only solution is to delete the files inside the folder 'delta' with the magic command or dbutils:

%fs rm -r delta/mytable?

EDIT:

For clarification, I put here a very basic example.

Example:

#create dataframe...
from pyspark.sql.types import *

cSchema = StructType([StructField("items", StringType())\
                      ,StructField("number", IntegerType())])

test_list = [['furniture', 1], ['games', 3]]

df = spark.createDataFrame(test_list,schema=cSchema)

and save it in a Delta table

df.write.format("delta").mode("overwrite").save("/delta/test_table")

Then, if I try to delete it.. it's not possible with drop table or similar action

%SQL
DROP TABLE 'delta.test_table'

neither other options like drop table 'delta/test_table', etc, etc...

Loreleilorelie answered 22/11, 2019 at 9:1 Comment(1)
%fs rm -r Path of Delta file After dropping the delta table use the below command before the create OR Replace table command set spark.databricks.delta.commitValidation.enabled = false; Set spark.databricks.delta.stateReconstructionValidation.enabled = false;Dibru
A
34

If you want to completely remove the table then a dbutils command is the way to go:

dbutils.fs.rm('/delta/test_table',recurse=True)

From my understanding the delta table you've saved is sitting within blob storage. Dropping the connected database table will drop it from the database, but not from storage.

Autolycus answered 26/11, 2019 at 1:51 Comment(1)
While this is true, if your have a Managed table you don't have read/write access to the delta table. Any managed tabled that you drop will be wiped after 30 days. That is a standard setting. It looks to me that the example here is for an External table, one where you have defined the storage path.Lithium
D
12

you can do that using sql command.

%sql
DROP TABLE IF EXISTS <database>.<table>
Desiccated answered 22/11, 2019 at 10:22 Comment(4)
thanks for your answer, but this doesn't work in a Delta Table. I tried already and it says that the table doesn't exist.Loreleilorelie
then it might be the case. The table does not exist. The table is just the abstraction over the actual physical files on DBFS. You want to drop the table? or you want to clean up the files? drop command doesnot really cleanup the dbfs location.Desiccated
Hi! thanks again... I think that the solution is only valid for "normal" tables in databricks, not Delta tables. I change my question and added an example.Loreleilorelie
hi! Finally I understand what I was doing wrong... after create the delta files, it required to create the SQL table: spark.sql("CREATE TABLE test_table USING DELTA LOCATION '/delta/test_table/'"). With this, your answer was correct! thanks!Loreleilorelie
P
9

Databricks has unmanaged tables and managed tables, but your code snippet just creates a Delta Lake. It doesn't create a managed or unmanaged table. The DROP TABLE syntax doesn't work because you haven't created a table.

Remove files

As @Papa_Helix mentioned, here's the syntax to remove files:

dbutils.fs.rm('/delta/test_table',recurse=True)

Drop managed table

Here's how you could have written your data as a managed table.

df.write.saveAsTable("your_managed_table")

Check to make sure the data table exists:

spark.sql("show tables").show()

+---------+------------------+-----------+
|namespace|         tableName|isTemporary|
+---------+------------------+-----------+
|  default|your_managed_table|      false|
+---------+------------------+-----------+

When the data is a managed table, then you can drop the data and it'll delete the table metadata & the underlying data files:

spark.sql("drop table if exists your_managed_table")

Drop unmanaged table

When the data is saved as an unmanaged table, then you can drop the table, but it'll only delete the table metadata and won't delete the underlying data files. Create the unmanaged table and then drop it.

df.write.option("path", "tmp/unmanaged_data").saveAsTable("your_unmanaged_table")

spark.sql("drop table if exists your_unmanaged_table")

The tmp/unmanaged_data folder will still contain the data files, even though the table has been dropped.

Check to make sure the table has been dropped:

spark.sql("show tables").show()

+---------+---------+-----------+
|namespace|tableName|isTemporary|
+---------+---------+-----------+
+---------+---------+-----------+

So the table isn't there, but you'd still need to run a rm command to delete the underlying data files.

Pharmaceutical answered 30/7, 2022 at 21:54 Comment(0)
C
5

Basically in databricks, Table are of 2 types - Managed and Unmanaged

  1. Managed - tables for which Spark manages both the data and the metadata,Databricks stores the metadata and data in DBFS in your account.

  2. Unmanaged - databricks just manage the meta data only but data is not managed by databricks.

so if you write a drop query for Managed tables it will drop the table and also delete the Data as well, but in case of Unmanaged tables if you write a drop query it will simply delete the sym-link pointer(Meta-information of table) to the table location but your data is not deleted, so you need to delete data externally using rm commands.

for more info: https://docs.databricks.com/data/tables.html

Carburize answered 14/2, 2020 at 5:24 Comment(0)
D
0

Delete from the GUI, Data -> DatabaseTables -> pick your database -> select the drop down next to your table and delete. I don't know consequences of this type of delete so caveat emptor

Debility answered 10/11, 2021 at 20:13 Comment(0)
C
0

I found that to fully delete a delta table and be able to create a new one under the same name with say a different schema, you have to also delete temp files (otherwise you get an error saying that an old file no longer exists).

dbutils.fs.rm('/delta/<my_schema>/<my_table>', recurse=True)
dbutils.fs.rm('/tmp/delta/<my_schema>/<my_table>', recurse=True)
Cuspid answered 1/9, 2022 at 15:1 Comment(0)
D
0
%fs 
rm -r Path of Delta file 

After dropping the delta table use the below command before the create OR Replace table command

set spark.databricks.delta.commitValidation.enabled = false; Set spark.databricks.delta.stateReconstructionValidation.enabled = false;
Dibru answered 12/3, 2023 at 7:48 Comment(0)
S
0

I've been researching this approach and it seems Databricks has updated the documentation on July 11, 2023, with something a bit more clarifying.

"When a managed table is dropped from Unity Catalog, its underlying data is deleted from your cloud tenant within 30 days."

source: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-drop-table.html

I hope it helps.

Sargeant answered 9/8, 2023 at 8:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.