I have to do simple remove "." from the strings before doing word count. It is working fine in two statements but giving me below error while writing in single statement. Am I doing something like stupid? Or it is a scope for improvement?
Error: org.apache.spark.sql.AnalysisException: Generators are not supported when it's nested in expressions, but got: regexp_replace(explode(split(CAST(value AS STRING), \s+)), [.]*, );
Code:
import org.apache.spark.sql.functions._
val testString = " I am X. X Works for Y."
val testDF = Seq (testString).toDF
val testDF1 = testDF.select(regexp_replace (explode (split($"value".cast("String"), "\\s+")), "[.]*", ""))
testDF1.show