Is the accumulator of reduce in Java 8 allowed to modify its arguments?
Asked Answered
B

2

17

In Java 8, Stream has a method reduce:

T reduce(T identity, BinaryOperator<T> accumulator);

Is the accumulator operator allowed to modify either of its arguments? I presume not since the JavaDoc says the accumulator should be NonInterfering, though all examples talk of modifying the collection, rather than modifying the elements of the collection.

So, for a concrete example, if we have

 integers.reduce(0, Integer::sum);

and suppose for a moment that Integer was mutable, would sum be allowed to modify its first parameter by adding to it (in place) the value of its second parameter?

I presume not, but I would also like an example of where this Interfering causes a problem.

Begrime answered 26/5, 2014 at 12:13 Comment(0)
M
13

No. The accumulator should not modify its arguments; it takes two values and produces a new value. If you want to use mutation in the course of accumulation (e.g., accumulating strings into a StringBuffer instead of concatenating), use Stream.collect(), which is designed for this.

Here's an example of code that produces the wrong answer if you try this. Let's say you want to do addition with a hypothetical MutableInteger class:

// Don't do this
MutableInteger result = stream.reduce(new MutableInteger(0), (a,b) -> a.add(b.get()));

One reason this gets the wrong answer is that if we break the computation up in parallel, now two computations are sharing the same mutable starting value. Note that:

a + b + c + d
= 0 + a + b + 0 + c + d  // 0 denotes identity
= (0 + a + b) + (0 + c + d) // associativity

so we are free to split the stream, compute the partial sums 0 + a + b and 0 + c + d, and then add the results. But if they are sharing the same identity value, and that value is mutated as a result of one of the computations, the other may start with the wrong value.

(Note further that the implementation would be allowed to do this even for sequential computations, if it thought that was worthwhile.)

Midwinter answered 26/5, 2014 at 12:58 Comment(6)
As long as no identity is provided to the reduce method, it seems that a mutating accumulator produces the good result, even when run in parallel. I understand that it is not good practice but it seems to "work".Napkin
@Napkin Even if "seems to work" actually meant "work", why would you? There's an alternative that's been designed for the purpose. Given that, why would you even consider using the tool not designed for it?Midwinter
collect seems more complicated to me, requiring a supplier, consumer, and combiner. So that's one reason to use reduce. Further, collect also seems to require a non-interfering accumulator. Does this non-interference only refer to the source collection then?Begrime
Still, that's no excuse to use the wrong thing, just because the right thing is more complicated... Non-interference means: it can't interfere with any other computation in the same stream pipeline. This includes the source, and also includes stupid things like one lambda modifies some state and another in the same pipeline depends on that state for its answer. It does not mean you can't update the result container.Midwinter
Maybe collect is not that much more complicated. I guess you could do something like stream.collect(MutableInteger::new, MutableInteger::add, MutableInteger::add).Begrime
collect has to invoke the Supplier for a Stream with one element. Because of that, I don't really see it as a replacement for what could be done with the one-argument reduce method if mutation was allowed.Portal
R
0

This is allowed syntactically, but I think it runs against the design pattern and is a bad idea.

  static void accumulatorTest() {
     ArrayList<Point> points = new ArrayList<>();
     points.add(new Point(5, 6));
     points.add(new Point(0, 6));
     points.add(new Point(1, 9));
     points.add(new Point(4, 16));
     BinaryOperator<Point> sumPoints = new BinaryOperator<Point>() {
        public Point apply(Point p1, Point p2) {
           p2.x += p1.x;
           p2.y += p1.y;
           return new Point(p2); //return p2 and the list is transformed into running total
        }
     };
     Point sum = points.stream().reduce(new Point(0, 0), sumPoints); 
     System.out.println(sum);
     System.out.println(points);
  }

The answer is correct; we get the sum of all of the x and y coordinates. The original list is modified, confirmed by the output:

java.awt.Point[x=10,y=37] [java.awt.Point[x=5,y=6], java.awt.Point[x=5,y=12], java.awt.Point[x=6,y=21], java.awt.Point[x=10,y=37]]

Rhoea answered 26/5, 2014 at 13:1 Comment(2)
I tried this and it worked fine. Tested it by increasing the list size to 10,000 points (on Intel i5 with 2 cores) should cause the JVM to actually run the command in parallel. Still returned correct final answer while modifying the list.Rhoea
This example is very narrowly crafted to 'work", mostly by accident. If, for example, you modified the structure of the list (additions or insertions), or you modified the state from an intermediate operation rather than from the last terminal operation, or you modified any state that was ever going to be read again during the computation, you'd see anomalies. So I would be very careful what you conclude from "I tried it and I didn't see a difference."Midwinter

© 2022 - 2024 — McMap. All rights reserved.