Select all except particular column in spark sql
Asked Answered
F

3

6

I want to select all columns in a table except StudentAddress and hence I wrote following query:

select `(StudentAddress)?+.+` from student;

It gives following error in Squirrel Sql client. org.apache.spark.sql.AnalysisException: cannot resolve '(StudentAddress)?+.+' given input columns

Flu answered 26/4, 2017 at 21:1 Comment(4)
This syntax does not seem to be supported by SparkShortcake
Any workarounds?Flu
What is the actual need?Shortcake
I am joining multiple very wide tables so after performing one join, I need to drop one of the joined column to remove ambiguity for next join. Currently, I am specifying all the column names I want in select but functionality like except columns would be very flexible. Thanks.Flu
D
9

You can use drop() method in the DataFrame API to drop a particular column and then select all the columns.

For example:

val df = hiveContext.read.table("student")
val dfWithoutStudentAddress = df.drop("StudentAddress")
Dragonfly answered 17/11, 2017 at 4:9 Comment(1)
The select("*") isn't necessary after the drop.Acoustic
C
12

Using spark sql try with

select * except(<columns to be excluded>) from tablename

Example:

select * from tmp
#+----+----+----+----+
#|col1|col2|col3|col4|
#+----+----+----+----+
#|a   |b   |c   |d   |
#+----+----+----+----+


#exclude col1,col2
select * except(col1,col2) from table_name
#+----+----+
#|col3|col4|
#+----+----+
#|c   |d   |
#+----+----+
Cookout answered 13/9, 2022 at 21:35 Comment(3)
any idea why when I try to do this I get: Undefined function: 'except'. This function is neither a registered temporary function nor a permanent function registered in the databaseAgain
Which version of spark SQL is this function except supported? Because as of this writing, it doesn't exist in the latest version of Apache Spark (v3.5.0): spark.apache.org/docs/3.5.0/api/sql/index.htmlScala
except(to be more accurate, except clause) is supported only in Databricks Runtime 11.0 and above. See docs.databricks.com/en/sql/language-manual/…. Open-Source Apache Spark doesn't support it as of 3.5.0Subconscious
D
9

You can use drop() method in the DataFrame API to drop a particular column and then select all the columns.

For example:

val df = hiveContext.read.table("student")
val dfWithoutStudentAddress = df.drop("StudentAddress")
Dragonfly answered 17/11, 2017 at 4:9 Comment(1)
The select("*") isn't necessary after the drop.Acoustic
T
0

I know this is old question but I recently run in a similar issue where I was required to swap a column.

My solution is for pyspark, but can easily implemented in other languages.

df = df.select([df[col] for col in df.columns if col != 'StudentAddress'])

this tecnique comes very handy when you have to swap column with a computed one.

For instance:

address = f.when(f.isnotnull(df.StudentAddress), df.StudentAddress).otherwise(f.lit('N/A'))
df = df.select([df[col] if col != 'StudentAddress' else address for col in df.columns])

Notice that in the list comprehension i return df[col] instead of just col to avoid ambiguities in case a previous join created more columns with the same name.

Transpicuous answered 22/2 at 9:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.