Using SparkR and Sparklyr simultaneously
Asked Answered
Q

1

10

As far as I understood, those two packages provide similar but mostly different wrapper functions for Apache Spark. Sparklyr is newer and still needs to grow in the scope of functionality. I therefore think that one currently needs to use both packages to get the full scope of functionality.

As both packages essentially wrap references to Java instances of scala classes, it should be possible to use the packages in parallel, I guess. But is it actually possible? What are your best practices?

Quietude answered 13/11, 2016 at 19:2 Comment(0)
B
4

These two packages use different mechanisms and are not designed for interoperability. Their internals are designed in different ways, and don't expose JVM backend in the same manner.

While one could think of some solution that would allow for partial data sharing (using global temporary views comes to mind) with persistent metastore, it would have rather limited applications.

If you need both I'd recommend separating your pipeline into multiple steps, and passing data between these, using persistent storage.

Benighted answered 26/1, 2019 at 18:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.