Generators are not supported when it's nested in expressions
Asked Answered
G

1

8

I have to do simple remove "." from the strings before doing word count. It is working fine in two statements but giving me below error while writing in single statement. Am I doing something like stupid? Or it is a scope for improvement?

Error: org.apache.spark.sql.AnalysisException: Generators are not supported when it's nested in expressions, but got: regexp_replace(explode(split(CAST(value AS STRING), \s+)), [.]*, );

Code:

import org.apache.spark.sql.functions._
val testString = " I am X. X Works for Y."
val testDF = Seq (testString).toDF
val testDF1 = testDF.select(regexp_replace (explode (split($"value".cast("String"), "\\s+")), "[.]*", ""))
testDF1.show
Guillerminaguillermo answered 2/5, 2018 at 1:28 Comment(5)
explode function is a generator which generates new rows. So you can't use functions on explodeKreiner
Got your solution. Thank you.Guillerminaguillermo
But documentation says : def explode(e: Column): Column Creates a new row for each element in the given array or map column.Guillerminaguillermo
you can read my previous comment the first lineKreiner
Thank you. Now clear. All the best.Guillerminaguillermo
U
0
import org.apache.spark.sql.functions._

val testString = " I am X. X Works for Y."
val testDF = Seq (testString).toDF
val testDF1 = testDF.withColumn("new", explode(split($"value".cast("String"), "\\s+"))).withColumn("value" ,regexp_replace(col("new"), "[.]*", "")).drop("new")
Undergird answered 15/10 at 16:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.