When does Erlang's parallelism overcome its weaknesses in numeric computing?

Asked 20/8, 2009 at 20:11 Answered 24/8, 2009 at 21:16

With all the hype around parallel computing lately, I've been thinking a lot about parallelism, number crunching, clusters, etc...

I started reading Learn You Some Erlang. As more people are learning (myself included), Erlang handles concurrency in a very impressive, elegant way.

Then the author asserts that Erlang is not ideal for number crunching. I can understand that a language like Erlang would be slower than C, but the model for concurrency seems ideally suited to things like image handling or matrix multiplication, even though the author specifically says its not.

Is it really that bad? Is there a tipping point where Erlang's strength overcomes its local speed weakness? Are/what measures are being taken to deal with speed?

To be clear: I'm not trying to start a debate; I just want to know.

Gassman answered 20/8, 2009 at 20:11 Comment(5)

Erlang was created to be a concurrent language for creating massive and reliable distributed telecom services - it's quite natural they were focusing on e.g. protocol parsing more than typical number crunching. – Static 20/8, 2009 at 20:17

I see a close vote on the grounds of "subjective and argumentative", and I agree that the title leaves something to be desired, but the questions is well and neutrally phrased, and to date we have only helpful responses. I'd say leave it unless things go downhill. @Jon: Consider re-wording the title. – Visual 20/8, 2009 at 20:25

How about that? I really tried to word things carefully and objectively. I just haven't seen things to answer my questions, so I just had to ask. – Gassman 20/8, 2009 at 20:39

I took a shot at it too. I think it is getting better. – Visual 20/8, 2009 at 20:42

there is one counterexample, which i haven't had time to look at source code of yet: wings3d wings3d.com – Insobriety 29/8, 2009 at 5:29

It's a mistake to think of parallelism as only about raw number crunching power. Erlang is closer to the way a cluster computer works than, say, a GPU or classic supercomputer.

In modern GPUs and old-style supercomputers, performance is all about vectorized arithmetic, special-purpose calculation hardware, and low-latency communication between processing units. Because communication latency is low and each individual computing unit is very fast, the ideal usage pattern is to load the machine's RAM up with data and have it crunch it all at once. This processing might involve lots of data passing among the nodes, as happens in image processing or 3D, where there are lots of CPU-bound tasks to do to transform the data from input form to output form. This type of machine is a poor choice when you frequently have to go to a disk, network, or some other slow I/O channel for data. This idles at least one expensive, specialized processor, and probably also chokes the data processing pipeline so nothing else gets done, either.

If your program requires heavy use of slow I/O channels, a better type of machine is one with many cheap independent processors, like a cluster. You can run Erlang on a single machine, in which case you get something like a cluster within that machine, or you can easily run it on an actual hardware cluster, in which case you have a cluster of clusters. Here, communication overhead still idles processing units, but because you have many processing units running on each bit of computing hardware, Erlang can switch to one of the other processes instantaneously. If it happens that an entire machine is sitting there waiting on I/O, you still have the other nodes in the hardware cluster that can operate independently. This model only breaks down when the communication overhead is so high that every node is waiting on some other node, or for general I/O, in which case you either need faster I/O or more nodes, both of which Erlang naturally takes advantage of.

Communication and control systems are ideal applications of Erlang because each individual processing task takes little CPU and only occasionally needs to communicate with other processing nodes. Most of the time, each process is operating independently, each taking a tiny fraction of the CPU power. The most important thing here is the ability to handle many thousands of these efficiently.

The classic case where you absolutely need a classic supercomputer is weather prediction. Here, you divide the atmosphere up into cubes and do physics simulations to find out what happens in each cube, but you can't use a cluster because air moves between each cube, so each cube is constantly communicating with its 6 adjacent neighbors. (Air doesn't go through the edges or corners of a cube, being infinitely fine, so it doesn't talk to the other 20 neighboring cubes.) Run this on a cluster, whether running Erlang on it or some other system, and it instantly becomes I/O bound.

Chem answered 24/8, 2009 at 21:16 Comment(0)

Is there a tipping point where Erlang's strength overcomes its local speed weakness?

Well, of course there is. For example, when trying to find the median of a trillion numbers :) :

http://matpalm.com/median/question.html

Just before you posted, I happened to notice this was the number 1 post on erlang.reddit.com.

Gebelein answered 20/8, 2009 at 20:18 Comment(3)

Well, you could do that in any language that offers parallelism. It just wouldn't be as clean and elegant as in Erlang. So it's more of a correctness/effort tradeoff than performance/parallelism – Glace 20/8, 2009 at 21:41

Looking at the source code of the worker module (number_less_than/2), they could have potentially spawned a separate process for checking each individual list element, and then sum them up in o(log n) time. Instead they chose to do it linearly in o(n) time. Would this be a case where data representation limits parallelism? – Orphism 20/8, 2009 at 22:18

hey zed, i'll have to have a look at the changes you suggest. to be honest the bulk of the project was about the erlangness of it all and was a bit light on the actual algorithm. – Profitsharing 19/2, 2010 at 0:23

Almost any language can be parallelized. In some languages it's simple, in others it's a pain in the butt, but it can be done. If you want to run a C++ program across 8000 CPU's in a grid, go ahead! You can do that. It's been done before.

Erlang doesn't do anything that's impossible in other languages. If a single CPU running an Erlang program is less efficient than the same CPU running a C++ program, then two hundred CPU's running Erlang will also be slower than two hundred CPU's running C++.

What Erlang does do is making this kind of parallelism easy to work with. It saves developer time and reduces the chance of bugs.

So I'm going to say no, there is no tipping point at which Erlang's parallelism allows it to outperform another language's numerical number-crunching strength.

Where Erlang scores is in making it easier to scale out and do so correctly. But it can still be done in other languages which are better at number-crunching, if you're willing to spend the extra development time.

And of course, let's not forget the good old point that languages don't have a speed. A sufficiently good Erlang compiler would yield perfectly optimal code. A sufficiently bad C compiler would yield code that runs slower than anything else.

Glace answered 20/8, 2009 at 20:11 Comment(6)

"Development time being equal" should be a given. Studies have shown conclusively that Erlang outperforms C++ in optimizing developer time for some problems. The question is, do any sorts of numerical computation problems fall into this category. – Gebelein 20/8, 2009 at 23:18

The OP said nothing about development time being equal. – Glace 20/8, 2009 at 23:25

Doesn't need to be stated. The question is nonsense otherwise. If development time doesn't need to be equal, we can categorically say that a distributed assembly cluster will be faster than Erlang. – Gebelein 20/8, 2009 at 23:44

Just because a question can be definitively answered doesn't mean it is nonsense. If development time has to be equal, then you could say that Python is faster than C++ too. Despite being an interpreted language, and despite having no support for concurrency. I think most people in the real world would accept that C++ is generally faster than Python despite this. – Glace 20/8, 2009 at 23:49

He means, are there numerical computation problems for which the developer/computing efficiency trade-off is better for Erlang than it is for other languages? (Actually, he probably has other assumptions, too, since human language is ambiguous.) To say, "no, because machine language is faster" is to reduce the question to a zen koan. – Gebelein 21/8, 2009 at 0:7

The 1 CPU => 200 CPU argument really makes sense to me, and that's I guess what motivated me to finally ask. I think I'm arriving at the same conclusion as far as it being a tool to make this type of parallelism easy to work with. – Gassman 21/8, 2009 at 12:33

There is pressure to make Erlang execute numeric code faster. The HiPe compiler compiles to native code instead of the BEAM bytecode for example, and it probably has its most effective optimization on code on floating points where it can avoid boxing. This is very beneficial for floating point code, since it can store values directly in FPU registers.

For the majority of Erlang usage, Erlang is plenty fast as it is. They use Erlang to write always-up control systems where the most important speed measurement that matters is low latency responses. Performance under load tends to be IO-bound. These users tend to stay away from HiPe since it is not as flexible/malleable in debugging live systems.

Now that servers with 128Gb of RAM are not that uncommon, and there's no reason they'll get even more memory, some IO-bound problems might shift over to be somewhat CPU bound. That could be a driver.

You should follow HiPe for the development.

Your examples of image manipulations and matrix multiplications seem to me as very bad matches for Erlang though. Those are examples that benefit from vector/SIMD operations. Erlang is not good at parallellism (where one does the same thing to multiple values at once).

Erlang processes are MIMD, multiple instructions multiple data. Erlang does lots of branching behind pattern matching and recursive loops. That kills CPU instruction pipelining.

The best architecture for heavily parallellised problems are the GPUs. For programming GPUs in a functional language I see the best potential in using Haskell for creating programs targeting them. A GPU is basically a pure function from input data to output data. See the Lava project in Haskell for creating FPGA circuits, if it is possible to create circuits so cleanly in Haskell, it can't be harder to create program data for GPUs.

The Cell architecture is very nice for vectorizable problems as well.

Kalil answered 20/8, 2009 at 20:11 Comment(1)

It's quite interesting that you say "Erlang is not good at parallellism (where one does the same thing to multiple values at once)", which I also think is correct - yet the link in the previous answer tells how good Erlang is because of being able to do the same thing to multiple values in parallel (checking if the value is less than a constant). – Orphism 20/8, 2009 at 22:2

I think the broader need is to point out that parallelism is not necessarily or even typically about speed.

It is about how to express algorithms or programs in which the sequence of activities is partial-ordered.

Fenn answered 20/8, 2009 at 20:11 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags