Select all except particular column in spark sql

Asked 26/4, 2017 at 21:1 Answered 22/2 at 9:0

Solved apache-spark apache-spark-sql hive spark-hive

I want to select all columns in a table except StudentAddress and hence I wrote following query:

select `(StudentAddress)?+.+` from student;

It gives following error in Squirrel Sql client. org.apache.spark.sql.AnalysisException: cannot resolve '(StudentAddress)?+.+' given input columns

Flu answered 26/4, 2017 at 21:1 Comment(4)

This syntax does not seem to be supported by Spark – Shortcake 27/4, 2017 at 10:25

Any workarounds? – Flu 27/4, 2017 at 13:43

What is the actual need? – Shortcake 27/4, 2017 at 13:45

I am joining multiple very wide tables so after performing one join, I need to drop one of the joined column to remove ambiguity for next join. Currently, I am specifying all the column names I want in select but functionality like except columns would be very flexible. Thanks. – Flu 27/4, 2017 at 13:49

You can use drop() method in the DataFrame API to drop a particular column and then select all the columns.

For example:

val df = hiveContext.read.table("student")
val dfWithoutStudentAddress = df.drop("StudentAddress")

Dragonfly answered 17/11, 2017 at 4:9 Comment(1)

The select("*") isn't necessary after the drop. – Acoustic 23/4, 2019 at 20:42

Using spark sql try with

select * except(<columns to be excluded>) from tablename

Example:

select * from tmp
#+----+----+----+----+
#|col1|col2|col3|col4|
#+----+----+----+----+
#|a   |b   |c   |d   |
#+----+----+----+----+


#exclude col1,col2
select * except(col1,col2) from table_name
#+----+----+
#|col3|col4|
#+----+----+
#|c   |d   |
#+----+----+

Cookout answered 13/9, 2022 at 21:35 Comment(3)

any idea why when I try to do this I get: Undefined function: 'except'. This function is neither a registered temporary function nor a permanent function registered in the database – Again 27/9, 2023 at 21:46

Which version of spark SQL is this function except supported? Because as of this writing, it doesn't exist in the latest version of Apache Spark (v3.5.0): spark.apache.org/docs/3.5.0/api/sql/index.html – Scala 20/11, 2023 at 4:26

except(to be more accurate, except clause) is supported only in Databricks Runtime 11.0 and above. See docs.databricks.com/en/sql/language-manual/…. Open-Source Apache Spark doesn't support it as of 3.5.0 – Subconscious 29/11, 2023 at 19:28

You can use drop() method in the DataFrame API to drop a particular column and then select all the columns.

For example:

val df = hiveContext.read.table("student")
val dfWithoutStudentAddress = df.drop("StudentAddress")

Dragonfly answered 17/11, 2017 at 4:9 Comment(1)

The select("*") isn't necessary after the drop. – Acoustic 23/4, 2019 at 20:42

I know this is old question but I recently run in a similar issue where I was required to swap a column.

My solution is for pyspark, but can easily implemented in other languages.

df = df.select([df[col] for col in df.columns if col != 'StudentAddress'])

this tecnique comes very handy when you have to swap column with a computed one.

For instance:

address = f.when(f.isnotnull(df.StudentAddress), df.StudentAddress).otherwise(f.lit('N/A'))
df = df.select([df[col] if col != 'StudentAddress' else address for col in df.columns])

Notice that in the list comprehension i return df[col] instead of just col to avoid ambiguities in case a previous join created more columns with the same name.

Transpicuous answered 22/2 at 9:0 Comment(0)

Recommended topics

Hot tags