I have a dataframe with "id" column and a column which has an array of struct. The schema:
root
|-- id: string (nullable = true)
|-- desc: array (nullable = false)
| |-- element: struct (containsNull = true)
| | |-- name: string (nullable = true)
| | |-- age: long (nullable = false)
The array "desc" can have any number of null values. I would like to create the final dataframe with the array having none of the null values using Spark 1.6:
An example would be:
Key . Value
1010 . [[George,21],null,[MARIE,13],null]
1023 . [null,[Watson,11],[John,35],null,[Kyle,33]]
I want the final dataframe as:
id . desc
1010 . [[George,21],[MARIE,13]]
1023 . [[Watson,11],[John,35],[Kyle,33]]
I tried doing this with UDF and case class
but got
java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to....
Any help is greatly appreciated and I would prefer doing it without converting to RDDs if needed.