Take every nth element from a Java 8 stream
Asked Answered
H

9

61

Suppose I have a list like this :

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Is it possible to use a Java 8 stream to take every second element from this list to obtain the following?

[1, 3, 5, 7, 9]

Or maybe even every third element?

[1, 4, 7, 10]

Basically, I'm looking for a function to take every nth element of a stream:

List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
List<Integer> list2 = list.stream().takenth(3).collect(Collectors.toList());
System.out.println(list2);
// => [1, 4, 7, 10]
Heida answered 24/7, 2015 at 4:50 Comment(2)
if this is a simplified scenario then maybe the actual scenario would be useful to come up with the solution. in the unlikely case it isn't: you can just filter by modulo 2 or 3.Stillman
Why a stream? Is the source a stream? Is the result always ending up in a list? Then transform the stream to an iterator and use an external int to keep track of the item number. If the source isn't a stream, use a for-loop. The only reason to use a stream here would be if the result should be a stream or the source is a stream...Capful
H
58

One of the prime motivations for the introduction of Java streams was to allow parallel operations. This led to a requirement that operations on Java streams such as map and filter be independent of the position of the item in the stream or the items around it. This has the advantage of making it easy to split streams for parallel processing. It has the disadvantage of making certain operations more complex.

So the simple answer is that there is no easy way to do things such as take every nth item or map each item to the sum of all previous items.

The most straightforward way to implement your requirement is to use the index of the list you are streaming from:

List<String> list = ...;
return IntStream.range(0, list.size())
    .filter(n -> n % 3 == 0)
    .mapToObj(list::get)
    .toList();

A more complicated solution would be to create a custom collector that collects every nth item into a list.

class EveryNth<C> {
    private final int nth;
    private final List<List<C>> lists = new ArrayList<>();
    private int next = 0;

    private EveryNth(int nth) {
        this.nth = nth;
        IntStream.range(0, nth).forEach(i -> lists.add(new ArrayList<>()));
    }

    private void accept(C item) {
        lists.get(next++ % nth).add(item);
    }

    private EveryNth<C> combine(EveryNth<C> other) {
        other.lists.forEach(l -> lists.get(next++ % nth).addAll(l));
        next += other.next;
        return this;
    }

    private List<C> getResult() {
        return lists.get(0);
    }

    public static Collector<Integer, ?, List<Integer>> collector(int nth) {
        return Collector.of(() -> new EveryNth(nth), 
            EveryNth::accept, EveryNth::combine, EveryNth::getResult));
}

This could be used as follows:

Stream.of("Anne", "Bill", "Chris", "Dean", "Eve", "Fred", "George")
    .parallel().collect(EveryNth.collector(3)).toList();

Which returns the result ["Anne", "Dean", "George"] as you would expect.

This is a very inefficient algorithm even with parallel processing. It splits all items it accepts into n lists and then just returns the first. Unfortunately it has to keep all items through the accumulation process because it's not until they are combined that it knows which list is the nth one.

Given the complexity and inefficiency of the collector solution I would definitely recommend sticking with the indices based solution above in preference to this if you can. If you aren't using a collection that supports get (e.g. you are passed a Stream rather than a List) then you will either need to collect the stream using Collectors.toList or use the EveryNth solution above.

Hartfield answered 24/7, 2015 at 5:9 Comment(5)
It's not very difficult to create such collector for sequential streams, but correct parallel implementation would be very ineffective. So imho better to forget about collector-based solutions and use indices.Regent
I think an efficient parallel implementation would be conditional on ORDERED, SIZED and SUBSIZED spliterator characteristics.Stillman
@TagirValeev I'll add a collector just for possible interest of readers. I agree that it's pretty ineffective - using indices is much more straightforward.Hartfield
The indices solution solves my problem. Thanks! I'll accept this answer because you put the additional effort into creating the collector-based solution as well.Saari
for (int i = 0; i < list.size(); i += 3) ...?Capful
E
14

EDIT - Nov 28, 2017

As user @Emiel suggests in the comments, the best way to do this would be to use Stream.itearate to drive the list through a sequence of indices:

List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

int skip = 3;
int size = list.size();
// Limit to carefully avoid IndexOutOfBoundsException
int limit = size / skip + Math.min(size % skip, 1);

List<Integer> result = Stream.iterate(0, i -> i + skip)
    .limit(limit)
    .map(list::get)
    .collect(Collectors.toList());

System.out.println(result); // [1, 4, 7, 10]

This approach doesn't have the drawbacks of my previous answer, which comes below (I've decided to keep it for historical reasons).


Another approach would be to use Stream.iterate() the following way:

List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

int skip = 3;
int size = list.size();
// Limit to carefully avoid IndexOutOfBoundsException
int limit = size / skip + Math.min(size % skip, 1);

List<Integer> result = Stream.iterate(list, l -> l.subList(skip, l.size()))
    .limit(limit)
    .map(l -> l.get(0))
    .collect(Collectors.toList());

System.out.println(result); // [1, 4, 7, 10]

The idea is to create a stream of sublists, each one skipping the first N elements of the previous one (N=3 in the example).

We have to limit the number of iterations so that we don't try to get a sublist whose bounds are out of range.

Then, we map our sublists to their first element and collect our results. Keeping the first element of every sublist works as expected because every sublist's begin index is shifted N elements to the right, according to the source list.

This is also efficient, because the List.sublist() method returns a view of the original list, meaning that it doesn't create a new List for each iteration.


EDIT: After a while, I've learnt that it's much better to take either one of @sprinter's approachs, since subList() creates a wrapper around the original list. This means that the second list of the stream would be a wrapper of the first list, the third list of the stream would be a wrapper of the second list (which is already a wrapper!), and so on...

While this might work for small to medium-sized lists, it should be noted that for a very large source list, many wrappers would be created. And this might end up being expensive, or even generating a StackOverflowError.

Esdras answered 24/7, 2015 at 19:27 Comment(4)
Really interesting solution. Thanks!Saari
Wouldn't Stream.iterate(0, i -> i + skip).limit(limit).map(list::get).collect(Collectors.toList()); bypass your wrapping problem? This would also be an optimization of @sprinter's answer since it wouldn't initialize and and filter all skipped values.Alvertaalves
value iterate is not a member of scala.collection.immutable.Stream[String]Signorelli
@Signorelli this answer is about Java, I don't know Scala. I'm using java.util.stream.Stream.iterate, which is a static methodEsdras
I
11

If you're willing to use a third party library, then jOOλ offers useful features like zipWithIndex():

Every second element

System.out.println(
Seq.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
   .zipWithIndex()             // This produces a Tuple2(yourvalue, index)
   .filter(t -> t.v2 % 2 == 0) // Filter by the index
   .map(t -> t.v1)             // Remove the index again
   .toList()
);
[1, 3, 5, 7, 9]

Every third element

System.out.println(
Seq.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
   .zipWithIndex()
   .filter(t -> t.v2 % 3 == 0)
   .map(t -> t.v1)
   .toList()
);
[1, 4, 7, 10]

Disclaimer: I work for the company behind jOOλ

Insignificance answered 6/1, 2016 at 23:19 Comment(0)
B
4

Use Guava:

Streams
    .mapWithIndex(stream, SimpleImmutableEntry::new)
    .filter(entry -> entry.getValue() % 3 == 0)
    .map(Entry::getKey)
    .collect(Collectors.toList());
Bareheaded answered 5/4, 2018 at 4:57 Comment(0)
N
2

You could also use flatMap with a custom function that skips items:

private <T> Function<T, Stream<T>> everyNth(int n) {
  return new Function<T, Stream<T>>() {
    int i = 0;

    @Override
    public Stream<T> apply(T t) {
      if (i++ % n == 0) {
        return Stream.of(t);
      }
      return Stream.empty();
    }
  };
}

@Test
public void everyNth() {
  assertEquals(
    Arrays.asList(1, 4, 7, 10),
    IntStream.rangeClosed(1, 10).boxed()
      .flatMap(everyNth(3))
      .collect(Collectors.toList())
  );
}

It has the advantage of working with non-indexed streams. But it's not a good idea to use it with parallel streams (maybe switch to an atomic integer for i).

Nahum answered 6/3, 2016 at 10:43 Comment(0)
C
2

Try this.

    List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
    int[] n = {0};
    List<Integer> result = list.stream()
        .filter(x -> n[0]++ % 3 == 0)
        .collect(Collectors.toList());
    System.out.println(result);
    // -> [1, 4, 7, 10]
Cook answered 6/3, 2016 at 11:14 Comment(1)
This looks like it will break on parallel implementations. Others have solved this by using an AtomicInteger and getAndIncrement().Breadwinner
L
1

Here is code by abacus-common

Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
        .filter(MutableInt.of(0), (e, idx) -> idx.getAndDecrement() % 2 == 0)
        .println();
// output: 1, 3, 5, 7, 9

Or if index required:

Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
      .indexed().filter(i -> i.index() % 2 == 0).println();
// output: [0]=1, [2]=3, [4]=5, [6]=7, [8]=9

Declaration: I'm the developer of abacus-common.

Leonoraleonore answered 29/11, 2016 at 19:30 Comment(0)
J
0

Can you try this

employees.stream()
.filter(e -> e.getName().charAt(0) == 's')
.skip(n-1)
.findFirst()
Jeffersonjeffery answered 4/4, 2019 at 11:0 Comment(0)
T
0

I'm coming here from How to avoid memory overflow using high throughput JAVA I/O Stream from JDBC connectors? which suggests you are concerned about foot print.

I therefore suggest the following solution which should have a small rate of garbage collection

int[] counter = new int[]{0};

list.stream()
.filter(l -> counter[0]++ % n == 0)

Of course you need to ensure that your stream isn't parallel.

Travertine answered 10/7, 2019 at 7:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.