Multiply PySpark array column by a scalar
Asked Answered
F

1

3

I am trying to multiply an array typed column by a scalar. This scalar is also a value from the same PySpark dataframe.

For example, I have this dataframe:

df = sc.parallelize([([1, 2],3)]).toDF(["l","factor"])
+------+------+
|     l|factor|
+------+------+
|[1, 2]|     3|
+------+------+

What I want to achieve is this:

+------+------+
|     l|factor|
+------+------+
|[3, 6]|     3|
+------+------+

This is what I have tried:

df.withColumn("l", lit("factor") * df.l)

It returns a type mismatch error. How can I multiply an array typed column by a number?

Figwort answered 19/6, 2020 at 20:27 Comment(0)
S
7

From spark-2.4 use transform

spark.sql(""" select l, factor, transform(l,x -> x * factor) as result from tmp """).show(10,False)
#+------+------+------+
#|l     |factor|result|
#+------+------+------+
#|[1, 2]|3     |[3, 6]|
#+------+------+------+

Using dataframe API:

df.withColumn("res",expr("""transform(l,x -> x*factor)""")).show()
#+------+------+------+
#|     l|factor|   res|
#+------+------+------+
#|[1, 2]|     3|[3, 6]|
#+------+------+------+
Sharpset answered 19/6, 2020 at 20:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.