In Hive, how can I add a column only if that column does not exist?
Asked Answered
S

2

9

I would like to add a new column to a table, but only if that column does not already exist.

This works if the column does not exist:

ALTER TABLE MyTable ADD COLUMNS (mycolumn string);

But when I execute it a second time, I get an error.

Column 'mycolumn' exists

When I try to use the "IF NOT EXISTS" syntax that is supported for CREATE TABLE and ADD PARTITION, I get a syntax error:

ALTER TABLE MyTable ADD IF NOT EXISTS COLUMNS (mycolumn string);
FAILED: ParseException line 3:42 required (...)+ loop did not match anything at input 'COLUMNS' in add partition statement

What I need is something that can execute itempotently so I can run my query whether this column exists or not.

Summerlin answered 13/8, 2014 at 17:57 Comment(0)
H
6

You can partially work it around, by setting the hive.cli.errors.ignore flag. In this case hive CLI will force the execution of further queries even when queries on the way fail.

In this example:

SET hive.cli.errors.ignore=true;
ALTER TABLE MyTable ADD COLUMNS (mycolumn string);
ALTER TABLE MyTable ADD COLUMNS (mycolumn string);
ALTER TABLE MyTable ADD COLUMNS (mycolumn2 string);

hive will execute all queries, even though there'll be an error in the second query.

Hymnal answered 6/10, 2015 at 10:33 Comment(3)
SET hive.cli.errors.ignore=true; is this properoty applicable for beeline shell as well ?Unsettled
Looks like it should have been added in issues.apache.org/jira/browse/HIVE-11191 . I'm on an older version, so can't really test if it actually works.Hymnal
Is there another way if the flag cannot be set because of Error: Error while processing statement: Cannot modify hive.cli.errors.ignore at runtime. It is not in list of params that are allowed to be modified at runtime (state=42000,code=1) ?Blackamoor
A
1

Well there is no direct way to do that. I mean through a single query. There are two other ways:

1.) Using JDBC:

1.1) Do describe on the table name.
1.2) You will get a list of columns in result set.
1.3) Check if your columns exists or not by iterating through the result set.

2.) Using hive Metastore client:

2.1) Create a object of HiveMetastoreClient
2.2) HiveMetastoreClient.getFields(<>db_name, <table_name>).get(index).getName() will give you the column name.
2.3) Check if your column exists of not by comparing the list.

Hope it helps...!!!

Aeropause answered 14/8, 2014 at 20:44 Comment(2)
So, there is no way to do this in stand-alone HQL? It requires using some compiled code with conditionals in it?Summerlin
Well as far as I know, there is no other way. And not through Hive QL.Aeropause

© 2022 - 2024 — McMap. All rights reserved.