Apache Spark or Cascading framework? [closed]
Asked Answered
D

1

8

I am confused as to when to use the Cascading framework and when to use Apache Spark. What are suitable use cases for each one?

Any help is appreciated.

Dimmick answered 11/8, 2014 at 10:4 Comment(1)
Please re-open following edit. The question at its core is valid: do the two frameworks have different use cases?Oyez
R
14

At heart, Cascading is a higher-level API on top of execution engines like MapReduce. It is analogous to Apache Crunch in this sense. Cascading has a few other related projects, like a Scala version (Scalding), and PMML scoring (Pattern).

Apache Spark is similar in the sense that it exposes a high-level API for data pipelines, and one that is available in Java and Scala.

It's more of an execution engine itself, than a layer on top of one. It has a number of associated projects, like MLlib, Streaming, GraphX, for ML, stream processing, graph computations.

Overall I find Spark a lot more interesting today, but they're not exactly for the same thing.

Reductive answered 11/8, 2014 at 10:22 Comment(3)
Cascading aims to support Spark as an "execution fabric". See cascading.org/new-fabric-support for more details.Immoderate
Spark would more properly compared to MapReduce, which contrasts in-memory processing (Spark) vs. disk-based processing (MapReduce). Cascading currently is just an interface for writing MapReduce jobs.Bouley
Any learnings that you can share if you moved code from scalding/cascading to spark?Terris

© 2022 - 2024 — McMap. All rights reserved.