Using the traditional, sequential reduction approach, the following graph is reduced as:
(+ (+ 1 2) (+ 3 4)) ->
(+ 3 (+ 3 4)) ->
(+ 3 7) ->
10
Graph reductions are, though, inherently parallel. One could, instead, reduce it as:
(+ (+ 1 2) (+ 3 4)) ->
(+ 3 7) ->
10
As far as I know, every functional programming language uses the first approach. I believe this is mostly because, on the CPU, scheduling threads overcompensate the benefits of doing parallel reductions. Recently, though, we've been starting to use the GPU more than the CPU for parallel applications. If a language ran entirely on the GPU, those communication costs would vanish.
Are there functional languages making use of that idea?