Haskell, Scala, Clojure, what to choose for high performance pattern matching and concurrency [closed]
Asked Answered
O

2

44

I have started work on FP recently after reading a lot of blogs and posts about advantages of FP for concurrent execution and performance. My need for FP has been largely influenced by the application that I am developing, My application is a state based data injector into another subsystem where timing is very crucial (close to a 2 million transactions per sec). I have a couple of such subsystems which needs to be tested. I am seriously considering using FP for its parallelism and want to take the correct approach, many posts on SO talk about disadvantages and advantages of Scala, Haskell and Clojure wrt language constructs, libraries and JVM support. From a language point of view I am ok to learn any language as long as it will help me achieve the result.

Certain posts favor Haskell for pattern matching and simplicity of language, JVM based FP lang have a big advantage with respect to using existing java libraries. JaneStreet is a big OCAML supporter but I am really not sure about developer support and help forums for OCAML.

If anybody has worked with handling such large data, please share your experience.

Ornithine answered 23/7, 2012 at 5:42 Comment(1)
You may also be interested to look at this test for Scala vs Clojure speed comparison #11641598Germanism
M
55

Do you want fast or do you want easy?

If you want fast, you should use C++, even if you're using FP principles to aid in correctness. Since timing is crucial, the support for soft (and hard, if need be) real-time programming will be important. You can decide exactly how and when you have time to recover memory, and spend only as much time as you have on that task.

The three languages you've stated all are ~2-3x slower than near-optimally hand-tuned C++ tends to be, and then only when used in a rather traditional imperative way. They all use garbage collection, which will introduce uncontrolled random delays in your transactions.

Now, that said, it's a lot of work to get this running in bulletproof fashion with C++. Applying FP principles requires considerably more boilerplate (even in C++11), and most libraries are mutable by default. (Edit: Rust is becoming a good alternative, but it is beyond the scope of this answer to describe Rust in sufficient detail.)

Maybe you don't have the time and can afford to scale back on other specifications. If it is not timing but throughput that is crucial, for example, then you probably want Scala over Clojure (see the Computer Languages Benchmark Game, where Scala wins every benchmark as of this writing and has lower code size in almost every case (Edit: CLBG is not helpful in this regard any more, though you may find archives supporting these statements on the Web Archive)); OCaml and Haskell should be chosen for other reasons (similar benchmark scores, but they have different syntax and interoperability and so on).

As far as which system has the best concurrency support, Haskell, Clojure and Scala are all just fine while OCaml is a bit lacking.

This pretty much narrows it down to Haskell and Scala. Do you need to use Java libraries? Scala. Do you need to use C libraries? Probably Haskell. Do you need neither? Then you can choose either on the basis of which one you prefer stylistically without having to worry overly much that you've made your life vastly harder by choosing the wrong one.

Moten answered 23/7, 2012 at 6:23 Comment(10)
Could you please add a reference to the "2-3x slower than C++"? ThanksEarmark
+1 for the reference . I think you should add the article inside your answer when you say it.Earmark
@Edmondo1984 - Again, look at the Computer Languages Benchmark Game. The Google benchmark is so bad it's not even a good game--it wouldn't get any serious attention if it wasn't from Google. Don't mistake the word "Game" in CLBG for something which is frivolous or poorly-done! It merely reflects the near-impossibility of creating useful, objective benchmarks without an insane amount of effort. CLBG is imperfect, but a lot less imperfect than almost anything else you'll find.Moten
@Ankur - C adds no additional speed an a huge additional syntactic burden compared to C++ when programming in a functional style. The only reason to choose that would be if one was using a platform where a C++ compiler was not available (some embedded platforms, some FPGA code generators, etc.). I seriously doubt that is the case here.Moten
I said that the reference to the google doc is great and maybe you should put that inside your answer so other readers don't have to look it in the comments if they don't trust your statement...Earmark
@Edmondo1984 - I think you misunderstand me. The Google doc is, I think, not great at all, and I do not wish to include a link because I think it is misleading or will waste people's time as they read it carefully and eventually realize that it is not great. (I don't think they intended it to be great--just kind of interesting, and then people wanted to read way more into it than the authors meant.)Moten
@RexKerr I've deleted benchmark linkGermanism
@Germanism - Well, I didn't really mean it should be deleted, just that a comment is an appropriate place for it (preferably with a warning--Google gave this a quick try, and this is what they came up with; you may have a similar experience if you try equally hard to optimize your code in all cases).Moten
Nowadays, Rust could be a good alternative. It has pattern matching like Haskell and much better support for FP than C++, but performs about as well as C++ in benchmarks.Newspaper
@Newspaper - Agreed. Rust is an alternative worth exploring these days.Moten
A
28

I've done this with Clojure, which proved pretty effective for the following reasons:

  • Being on the JVM is a huge advantage in terms of libraries. This effectively ruled out Haskell and Ocaml for my purposes, as we needed easy access to the Java ecosystem and integration with JVM based tools (Maven build etc.)
  • You can drop into pure Java if you need to tightly optimise inner loops. We did this for some custom code processing large double[] arrays, but 99% of the time Clojure can get you the performance you need. See http://www.infoq.com/presentations/Why-Prismatic-Goes-Faster-With-Clojure for some examples of how to make Clojure go really fast (quite technical video, assumes some prior knowledge!). Once you start counting the ease of exploiting multiple cores, Clojure is very competitive on performance.
  • Clojure has very nice multi-core concurrency support. This proved extremely useful for managing concurrent tasks. See http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey
  • The REPL makes a very good environment for testing and exploratory work on data.
  • Clojure is lazy which makes it suitable for handling larger-than-memory data sets (assuming you are careful not to try and force the whole data set into memory at once). There are also some nice libraries available in such an environment, most notable are Storm and Aleph. Storm may be particularly interesting for you, as it's designed for distributed realtime processing of large numbers of events.

I can't speak with quite so much experience of the other languages, but my impression from some practical experience of Haskell and Scala is:

  • Haskell is great if you care about purity and strict functional programming with static types. The static typing can be a strong guarantee of correctness so might make this suitable for highly algorithmic work. Personally, I find pure FP a little too rigid - there are many times when mutable state is useful and I think Clojure has a slightly better balance here (by allowing controlled muability thorugh managed references).
  • Scala is a great language and shares with Clojure the advantages of being on the JVM. To me Scala is more like a "better Java" with functional features and a very impressive type system. It's less of a paradigm shift from Clojure. Downside is that the type system can get quite complex / confusing.

Overall, I think you could be happy with any of these. It will probably come down to how much you care about the JVM and your view on type systems.

Adila answered 23/7, 2012 at 6:22 Comment(6)
4 of your 5 arguments also applies to Scala (except the last one - Clojure is lazy)Germanism
Yeah agree, I really like Scala, just haven't used it as much. Both are great languages. Arguably you can of course produce lazy behaviour in Scala too with a little effort, it's just not "built in" to the same extent as in Clojure.Adila
Many (most) will consider Scala's type system an upside instead of a downside.Jabberwocky
While Clojure makes it easy to work with lazy sequences, calling a language lazy usually refers to its evaluation strategy. Haskell is far more lazy than Clojure in this respect, but that may be off-topic for this particular question.Petulancy
I agree with @GordonGustafson. Clojure is strict. I think you mean to say that it has built-in types for laziness, such as lazy sequences/cons and delays. For it to be lazy it must be non-strict + have sharing (still upvoted though).Phi
There is no evidence that types add correctness.Arrowhead

© 2022 - 2024 — McMap. All rights reserved.