Apache Spark or Cascading framework? [closed]

About

Asked 11/8, 2014 at 10:4 Answered 11/8, 2014 at 10:22

I am confused as to when to use the Cascading framework and when to use Apache Spark. What are suitable use cases for each one?

Any help is appreciated.

Dimmick answered 11/8, 2014 at 10:4 Comment(1)

Please re-open following edit. The question at its core is valid: do the two frameworks have different use cases? – Oyez 17/8, 2015 at 0:46

At heart, Cascading is a higher-level API on top of execution engines like MapReduce. It is analogous to Apache Crunch in this sense. Cascading has a few other related projects, like a Scala version (Scalding), and PMML scoring (Pattern).

Apache Spark is similar in the sense that it exposes a high-level API for data pipelines, and one that is available in Java and Scala.

It's more of an execution engine itself, than a layer on top of one. It has a number of associated projects, like MLlib, Streaming, GraphX, for ML, stream processing, graph computations.

Overall I find Spark a lot more interesting today, but they're not exactly for the same thing.

Reductive answered 11/8, 2014 at 10:22 Comment(3)

Cascading aims to support Spark as an "execution fabric". See cascading.org/new-fabric-support for more details. – Immoderate 12/10, 2014 at 19:54

Spark would more properly compared to MapReduce, which contrasts in-memory processing (Spark) vs. disk-based processing (MapReduce). Cascading currently is just an interface for writing MapReduce jobs. – Bouley 5/1, 2015 at 20:47

Any learnings that you can share if you moved code from scalding/cascading to spark? – Terris 26/4, 2023 at 19:0

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags