Running PySpark from Scala/Java Spark
Asked Answered
S

0

6

I am processing my data with Scala Spark and want to use pySpark/python for further processing.

Below is the sample for Pyspark -> scala but I am looking for scala->Pyspark

https://www.crowdstrike.com/blog/spark-hot-potato-passing-dataframes-between-scala-spark-and-pyspark/

Below are few approaches I found for Scala-> PySpark

  1. Jython is one way -> but it doesn't have all api/libs as Python
  2. Pipe method -> val pipedData = data.rdd.pipe("hdfs://namenode/hdfs/path/to/script.py")

But with Pipe I loose benefits of dataframe and in python I may need to reconvert it to Dataframe/DataSet.

Is there any other better way on how Scala spark can talk to PYSpark with same sparkContext/session?

Slippy answered 14/5, 2019 at 14:34 Comment(3)
Did you find a way to do that?Flock
I have faced the same issue but ended up storing the pyspark data frame in temp location and scala picksup from there to process further. plz let me know if you found the solutionPalaeontology
Yes, now there is a solution. Answered in another question: #68763664Semanteme

© 2022 - 2024 — McMap. All rights reserved.