Is There Some Stream-Only Way To Determine The Index Of The Max Stream Element?
Asked Answered
M

5

2

I have a Stream<Set<Integer>> intSetStream.

I can do this on it...

Set<Integer> theSetWithTheMax = intSetStream.max( (x,y)->{ return Integer.compare( x.size(), y.size() ); } ).get( );

...and I get a hold of the Set<Integer> that has the highest number of Integer elements in it.

That's great. But what I really need to know is, is it the 1st Set in that Stream that's the max? Or is it the 10th Set in the Stream? Or the ith Set? Which one of them has the most elements in it?

So my question is: Is there some way — using the Stream API — that I can determine "It was the ith Set in the Stream of Sets that returned the largest value of them all, for the Set.size( ) call"?

The best solution I can think of, is to iterate over the Stream<Set<Integer>> (using intSetStream.iterator()) and do a hand-rolled max( ) calculation. But I'm hoping to learn a more Stream-y way to go about it; if there is such a thing.

Mommy answered 11/4, 2018 at 19:6 Comment(4)
Streams are generally not designed for operations where the order matters. Do you need both the index and the set or just the index?Placida
@Placida "Streams are generally not designed for operations where the order matters." what do you mean? as far as I know, if there is a defined encounter order then the stream will preserve that.Antiproton
@Aominè Yes, that's true. What I meant was that very few of the methods actually care about the order. The API was originally designed in such way that it wouldn't matter whether the stream is running in parallel or not. In Java 9 they added takeWhile and dropWhile which only work for ordered streams.Placida
@Placida I only need the index.Mommy
M
2

One way to do it is to firstly map Stream<Set<Integer>> to a Collection<Integer> where each element is the size of each Set<Integer> and then you can extract what is the largest number of elements given Stream<Set<Integer>> and then get the "index" of this set by finding an index of the largest number in the collection of sizes.

Consider following example:

import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class IntSetStreamExample {

    public static void main(String[] args) {

        final Stream<Set<Integer>> stream = Stream.of(
                new HashSet<>(Arrays.asList(1,2,3)),
                new HashSet<>(Arrays.asList(1,2)),
                new HashSet<>(Arrays.asList(1,2,3,4,5)),
                new HashSet<>(Arrays.asList(0)),
                new HashSet<>(Arrays.asList(0,1,2,3,4,5)),
                new HashSet<>()
        );

        final List<Integer> result = stream.map(Set::size).collect(Collectors.toList());

        System.out.println("List of number of elements in Stream<Set<Integer>>: " + result);

        final int max = Collections.max(result);

        System.out.println("Largest set contains " + max + " elements");

        final int index = result.indexOf(max);

        System.out.println("Index of the largest set: " + index);
    }
}

The exemplary output may look like this:

List of number of elements in Stream<Set<Integer>>: [3, 2, 5, 1, 6, 0]
Largest set contains 6 elements
Index of the largest set: 4
Magalymagan answered 11/4, 2018 at 19:28 Comment(1)
Thanks Syzmon Stepniak. Cool answer! I do agree with @davidxxx that the Stream.iterator( ) approach I mentioned in my original question, is probably the most straightforward way to go about something like this. However, the only reason I asked my question is to solidify my grasp of Streams. I'm not asking "what is the most straighforward way to go about this?" Sometimes, getting my head around a solution that's "overkill", is a good way for me to make some things more concrete for me. So I will accept your answer. Thanks.Mommy
I
6

You can do this with a custom collector:

int posOfMax = stream.mapToInt(Set::size)
    .collect(() -> new int[] { 0, -1, -1 },
            (a,i) -> { int pos = a[0]++; if(i>a[2]) { a[1] = pos; a[2] = i; } },
            (a1,a2) -> {
                if(a2[2] > a1[2]) { a1[1] = a1[0]+a2[1]; a1[2] = a2[2]; }
                a1[0] += a2[0];
            })[1];

This is the most lightweight solution. Its logic becomes clearer when we use a dedicated class instead of an array:

int posOfMax = stream.mapToInt(Set::size)
    .collect(() -> new Object() { int size = 0, pos = -1, max = -1; },
            (o,i) -> { int pos = o.size++; if(i>o.max) { o.pos = pos; o.max = i; } },
            (a,b) -> {
                if(b.max > a.max) { a.pos = a.size+b.pos; a.max = b.max; }
                a.size += b.size;
            }).pos;

The state object holds the size, which is simply the number of elements encountered so far, the last encountered max value and its position which we update to the previous value of the size if the current element is bigger than the max value. That’s what the accumulator function (the second argument to collect) does.

In order to support arbitrary evaluation orders, i.e. parallel stream, we have to provide a combiner function (the last argument to collect). It merges the state of two partial evaluation into the first state. If the second state’s max value is bigger, we update the first’s max value and the position, whereas we have to add the first state’s size to the second’s position to reflect the fact that both are partial results. Further, we have to update the size to the sum of both sizes.

Immensurable answered 12/4, 2018 at 17:22 Comment(6)
Even though @davidxxx's and Syzmon Stepniak's answers are both excellant answers, this is easily the most elegant and most lightweight solution offered so far. I would be surprised if there were another solution that outdid this one on elegance and lowest space cost. This is exactly the kind of "stream-only" solution I had in mind in asking the question. If you'd posted it two days earlier, this would have been the accepted answer. A hearty Thanks for such a "beautiful" answer :)Mommy
I'm dying to know what inspired this solution to occur to you, Holger. Had you seen a similar solution somewhere else? In the Stream API source code itself maybe? Is it simply something that came from advance experience working with Java's Stream API or maybe previous experience with functional programming idioms in general? Or something as simple as 2 days worth of trial and error and/or reading the best book in the world on the Stream API? Speaking of which. What in your opinion is the best book in the world on the Stream API by the way? If one doesn't exist, you should write it ;)Mommy
@Mommy you should stay on SO more often... :) Holger's answers are among most appreciated around here, where I work we have an entire package called xxx.HolgerUtil - mainly because 1) why not? 2) the things we have there are only possible because of his answers 3) the package documentation starts with "If you think this is a weird class Name, you haven't worked here long enough..."Arbitrator
There's an interesting coincidence related to Holger's array-based explanation. The Set<Integer>s of my OQ started out life as int[] representations. The Set<Integer>s represent slices of a table as prescribed by a dynamic programming algorithm. The original algo uses an int[][] to calculate a maximum number of visitations. As an exercise, I implemented the algorithm using Stream<Set<Integer>>s in place of the int[] rows. How close is my guess, Holger, that your solution was inspired by some sort or memoization idiom?Mommy
There is an existing builtin collector, summarizingInt, which provides the count and maximum (as well as sum and minimum) in one go. From there, it’s only a small step to get the index of the maximum. You may also consider collect to be the operation closest to a loop, allowing mutable state, having the supplier for the initialization and the accumulator performing the loop’s body. The new thing and biggest obstacle is the combiner for partial results.Immensurable
Awesome! I see what you mean. Thanks for the insight :)Mommy
M
2

One way to do it is to firstly map Stream<Set<Integer>> to a Collection<Integer> where each element is the size of each Set<Integer> and then you can extract what is the largest number of elements given Stream<Set<Integer>> and then get the "index" of this set by finding an index of the largest number in the collection of sizes.

Consider following example:

import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class IntSetStreamExample {

    public static void main(String[] args) {

        final Stream<Set<Integer>> stream = Stream.of(
                new HashSet<>(Arrays.asList(1,2,3)),
                new HashSet<>(Arrays.asList(1,2)),
                new HashSet<>(Arrays.asList(1,2,3,4,5)),
                new HashSet<>(Arrays.asList(0)),
                new HashSet<>(Arrays.asList(0,1,2,3,4,5)),
                new HashSet<>()
        );

        final List<Integer> result = stream.map(Set::size).collect(Collectors.toList());

        System.out.println("List of number of elements in Stream<Set<Integer>>: " + result);

        final int max = Collections.max(result);

        System.out.println("Largest set contains " + max + " elements");

        final int index = result.indexOf(max);

        System.out.println("Index of the largest set: " + index);
    }
}

The exemplary output may look like this:

List of number of elements in Stream<Set<Integer>>: [3, 2, 5, 1, 6, 0]
Largest set contains 6 elements
Index of the largest set: 4
Magalymagan answered 11/4, 2018 at 19:28 Comment(1)
Thanks Syzmon Stepniak. Cool answer! I do agree with @davidxxx that the Stream.iterator( ) approach I mentioned in my original question, is probably the most straightforward way to go about something like this. However, the only reason I asked my question is to solidify my grasp of Streams. I'm not asking "what is the most straighforward way to go about this?" Sometimes, getting my head around a solution that's "overkill", is a good way for me to make some things more concrete for me. So I will accept your answer. Thanks.Mommy
M
1

Streams methods are not designed to be aware of the current element iterated.
So I think that you actual way : find the Set with the max of elements and then iterate on the Sets to find this Set is not bad.

As alternative you could first collect the Stream<Set<Integer>> into a List (to have a way to retrieve the index) and use a SimpleImmutableEntry but it seems really overkill :

Stream<Set<Integer>> intSetStream = ...;
List<Set<Integer>> list = intSetStream.collect(Collectors.toList());

SimpleImmutableEntry<Integer, Set<Integer>> entry = 
        IntStream.range(0, list.size())
                 .mapToObj(i -> new SimpleImmutableEntry<>(i, list.get(i)))
                 .max((x, y) -> {
                     return Integer.compare(x.getValue()
                                             .size(),
                                            y.getValue()
                                             .size());
                 })
                 .get();

Integer index = entry.getKey();
Set<Integer> setWithMaxNbElements = entry.getValue();
Mastaba answered 11/4, 2018 at 19:12 Comment(0)
M
1

Insight provided in @Holzer's custom Collector-based solution (on top of my downright shameless plagiarizing of the source code of IntSummaryStatistics.java), inspired a custom Collector-based solution of my own; that might, in turn, inspire others...

public class IndexOfMaxCollector implements IntConsumer {

    private int max = Integer.MIN_VALUE;
    private int maxIdx = -1;
    private int currIdx = 0;

    public void accept( int value ){

        if( value > max ) 
            maxIdx = currIdx;

        max = Math.max( max, value );

        currIdx++;
    }

    public void combine( IndexOfMaxCollector other ){

        if( other.max > max ){

            maxIdx = other.maxIdx + currIdx; 

            max = other.max;  
        }

        currIdx += other.currIdx;
    }

    public int getMax( ){ return this.max; }

    public int getIndexOfMax( ){ return this.maxIdx; }
}

...Using that custom Collector, I could take the intSetStream of my OQ and determine the index of the Set<Integer> that contains the highest number of elements, like this...

int indexOfMax = intSetStream.map( Set::size )
    .collect( IndexOfMaxCollector::new,
              IndexOfMaxCollector::accept,  
              IndexOfMaxCollector::combine )
    .getIndexOfMax( );

This solution — admittedly not the most "beautiful" — possibly has a teensie bit of an edge over others in both the reusability and understandability stakes.

Mommy answered 17/4, 2018 at 10:4 Comment(8)
Your combine method needs to combine the indices as well (like the combiner in my answer), e.g. maxIdx = other.max > max? other.maxIdx + currIdx: maxIdx; currIdx += other.currIdx;.Immensurable
Cheers @Immensurable :) I'll change that in a moment. At the risk of sounding like a numbskull, it's not obvious to me what's being "combined" with those expressions you shared. As long as I'm being humble, the "why" flew over my head too :) Please elaborate? Thanks :)Mommy
In a multi-threaded context, the supplier will be called once per worker thread, to use the returned object with the accumulator locally. This is what makes it thread safe; each worker modifies a different object. Once two worker threads have processed their elements, their partial results must be merged to a result covering both. In this specific case, each result will start counting by zero, so when using the right result’s index, the left result’s count must be added to it to get an index in the merged range. Further, the counts must be summed up, as the result might get merged again…Immensurable
See also hereImmensurable
Yeah. I did read that javadoc as part of my research before implementing my solution. Thanks just the same, though :) But it has to be said, I was obviously still confused even having read the the javadoc :) So your explanation really brought it home for me. Thanks again.Mommy
@Holger: "...the left result’s count must be added to it...Further, the counts must be summed up..." — Actually, I'm not tracking a "count" per se. But I understand what you mean ("currIdx").Mommy
There is no technical difference between counting and your currIdx. When doing it right, the final IndexOfMaxCollector instance’s currIdx field will contain the total count at the end of the operation. It serves the role of a current index in the accumulator function, but at the same time, it serves as a count of all encountered elements in the combiner function. You need the count to adapt the index when merging (and there wouldn’t be much sense to maintain two variables with different names but he same contents, just to name the two roles it serves).Immensurable
Let us continue this discussion in chat.Mommy
R
0

Here's a solution using the StreamEx library:

int index = StreamEx.of(intSetStream)
        .zipWith(IntStream.iterate(0, i -> i + 1))
        .maxBy(e -> e.getKey().size())
        .orElseThrow()
        .getValue();

This uses StreamEx.zipWith to zip the stream with the infinite stream [0, 1, 2, 3, …], producing a Stream<Map.Entry<Set<Integer>, Integer>> with Map.Entry consisting of the stream element and its index. It then uses the maxBy method to find the Map.Entry with the greatest value as an Optional<Map.Entry>. Finally, it returns the value of the entry within the optional, i.e. the index of the maximum entry.

Respiratory answered 3/1 at 23:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.