I am confused as to when to use the Cascading framework and when to use Apache Spark. What are suitable use cases for each one?
Any help is appreciated.
I am confused as to when to use the Cascading framework and when to use Apache Spark. What are suitable use cases for each one?
Any help is appreciated.
At heart, Cascading is a higher-level API on top of execution engines like MapReduce. It is analogous to Apache Crunch in this sense. Cascading has a few other related projects, like a Scala version (Scalding), and PMML scoring (Pattern).
Apache Spark is similar in the sense that it exposes a high-level API for data pipelines, and one that is available in Java and Scala.
It's more of an execution engine itself, than a layer on top of one. It has a number of associated projects, like MLlib, Streaming, GraphX, for ML, stream processing, graph computations.
Overall I find Spark a lot more interesting today, but they're not exactly for the same thing.
© 2022 - 2024 — McMap. All rights reserved.