I am processing my data with Scala Spark and want to use pySpark/python for further processing.
Below is the sample for Pyspark -> scala but I am looking for scala->Pyspark
Below are few approaches I found for Scala-> PySpark
- Jython is one way -> but it doesn't have all api/libs as Python
- Pipe method ->
val pipedData = data.rdd.pipe("hdfs://namenode/hdfs/path/to/script.py")
But with Pipe I loose benefits of dataframe and in python I may need to reconvert it to Dataframe/DataSet.
Is there any other better way on how Scala spark can talk to PYSpark with same sparkContext/session?