How to rename a column in Databricks
Asked Answered
S

3

16

How do you rename a column in Databricks?

The following does not work:

ALTER TABLE mySchema.myTable change COLUMN old_name new_name int

It returns the error:

ALTER TABLE CHANGE COLUMN is not supported for changing column 'old_name' with type 'IntegerType >(nullable = true)' to 'new_name' with type 'IntegerType (nullable = true)';

If it makes a difference, this table is using Delta Lake, and it is NOT partitioned or z-ordered by this "old_name" column.

Swop answered 26/12, 2019 at 17:6 Comment(1)
This is possible now, see Ispan's response!Mcvey
V
14

Recently has been published some modifications which allow to rename columns on DELTA TABLES in Databricks.

It is needed to set this properties on table:

ALTER TABLE <table_name> SET TBLPROPERTIES (
  'delta.minReaderVersion' = '2',
  'delta.minWriterVersion' = '5',
  'delta.columnMapping.mode' = 'name'
)

Afterwards , you can rename the column as always.

ALTER TABLE <table_name> RENAME COLUMN old_col_name TO new_col_name 

Check this: https://docs.databricks.com/delta/delta-column-mapping.html

Other usefull links:

https://docs.databricks.com/delta/delta-batch.html#rename-columns-1

https://docs.databricks.com/delta/delta-batch.html#change-column-type-or-name

Vey answered 28/6, 2022 at 14:35 Comment(2)
This does not work for me. When setting the TBLPROPOERTIES, I get the following error message : Error in SQL statement: ParseException: no viable alternative at input 'ALTER TABLE '/my_dir/my_table''. Any idea why ?Heirship
One of the requirements is to run on Databricks Runtime 10.2 or above. Check it. Thanks!Vey
A
18

You can't rename or change a column datatype in Databricks, only add new columns, reorder them or add column comments. To do this you must rewrite the table using the overwriteSchema option.

Take this example below from this documentation:

spark.read.table(...)
  .withColumnRenamed("date", "date_created")
  .write
  .mode("overwrite")
  .option("overwriteSchema", "true")
  .table(...)
Antho answered 26/12, 2019 at 18:59 Comment(5)
Does using "overwriteSchema" perform faster than dropping the table and re-creating it?Swop
I think the performance will be the same, but at least you can just execute all in one action, and if you are using Delta Lake, you can travel back in timeAntho
we don't need to do this anymore after the release of deltalake 0.7.0. we can use SQL now as i suggested in my next response to Alter/update a table.Monaural
Syntax has changed, so now you should look to Enayat's answer below.Underdrawers
If you're using a recent runtime in Databricks, this is no longer the case. See Ispan Cristi's answer that uses a simple ALTER TABLE statement.Swop
V
14

Recently has been published some modifications which allow to rename columns on DELTA TABLES in Databricks.

It is needed to set this properties on table:

ALTER TABLE <table_name> SET TBLPROPERTIES (
  'delta.minReaderVersion' = '2',
  'delta.minWriterVersion' = '5',
  'delta.columnMapping.mode' = 'name'
)

Afterwards , you can rename the column as always.

ALTER TABLE <table_name> RENAME COLUMN old_col_name TO new_col_name 

Check this: https://docs.databricks.com/delta/delta-column-mapping.html

Other usefull links:

https://docs.databricks.com/delta/delta-batch.html#rename-columns-1

https://docs.databricks.com/delta/delta-batch.html#change-column-type-or-name

Vey answered 28/6, 2022 at 14:35 Comment(2)
This does not work for me. When setting the TBLPROPOERTIES, I get the following error message : Error in SQL statement: ParseException: no viable alternative at input 'ALTER TABLE '/my_dir/my_table''. Any idea why ?Heirship
One of the requirements is to run on Databricks Runtime 10.2 or above. Check it. Thanks!Vey
M
8

To be able to rename the column, overwriteSchema with saveAsTable should be used:

spark.read.table(Table_Name)
  .withColumnRenamed("currentName", "newName")
  .write
  .format("delta")
  .mode("overwrite")
  .option("overwriteSchema", "true")
  .saveAsTable("Table_Name")
Monroe answered 23/4, 2021 at 19:58 Comment(5)
Just make sure you put Table_Name in quotes. Was new to Spark and for some reason this didn't click the first time.Deli
it there a way to do that temporary for the VIEW?Irfan
If you're using a recent runtime of Databricks, see Ispan Cristi's answer that uses a simple ALTER TABLE statement.Swop
@DavidMaddox You should not lower the answer's score. The answer was correct at the time of the asked question.Monroe
If you use workspace in databricks then you will probably need an \ (explicit line break) at each new line.Caxton

© 2022 - 2024 — McMap. All rights reserved.