I have a dataframe df
with a VectorUDT
column named features
. How do I get an element of the column, say first element?
I've tried doing the following
from pyspark.sql.functions import udf
first_elem_udf = udf(lambda row: row.values[0])
df.select(first_elem_udf(df.features)).show()
but I get a net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict(for numpy.dtype)
error. Same error if I do first_elem_udf = first_elem_udf(lambda row: row.toArray()[0])
instead.
I also tried explode()
but I get an error because it requires an array or map type.
This should be a common operation, I think.
Caused by: org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
. Any clues? – Showpiece