Java 8 Stream, getting head and tail
Asked Answered
B

10

24

Java 8 introduced a Stream class that resembles Scala's Stream, a powerful lazy construct using which it is possible to do something like this very concisely:

def from(n: Int): Stream[Int] = n #:: from(n+1)

def sieve(s: Stream[Int]): Stream[Int] = {
  s.head #:: sieve(s.tail filter (_ % s.head != 0))
}

val primes = sieve(from(2))

primes takeWhile(_ < 1000) print  // prints all primes less than 1000

I wondered if it is possible to do this in Java 8, so I wrote something like this:

IntStream from(int n) {
    return IntStream.iterate(n, m -> m + 1);
}

IntStream sieve(IntStream s) {
    int head = s.findFirst().getAsInt();
    return IntStream.concat(IntStream.of(head), sieve(s.skip(1).filter(n -> n % head != 0)));
}

IntStream primes = sieve(from(2));

Fairly simple, but it produces java.lang.IllegalStateException: stream has already been operated upon or closed because both findFirst() and skip() are terminal operations on Stream which can be done only once.

I don't really have to use up the stream twice since all I need is the first number in the stream and the rest as another stream, i.e. equivalent of Scala's Stream.head and Stream.tail. Is there a method in Java 8 Stream that I can use to achieve this?

Thanks.

Brotherly answered 6/11, 2013 at 2:37 Comment(5)
If you want to operate over the stream that way, the best you'll be able to do is probably to wrap its iterator(). (To say nothing of the fact that your implementation isn't actually a proper prime sieve; see e.g. this paper.)Fanfaron
I have tried extracting the first number and reconstructing another stream like IntStream.generate(() -> it.next()), but the iterator's hasNext() works eagerly and leads to an infinite recursion.Brotherly
Yeah, that's not going to work. Really, this isn't going to work cleanly with Streams at all.Fanfaron
See this blog post for an explanation on how to build a lazy sequence by yourself in Java.Atalante
I programmed this recently with JDK8, but I built my own stream, since I do not think JDK 8 streams are exactly the same thing as the scala's lazy streams.Tahsildar
P
11

Even if you hadn’t the problem that you can’t split an IntStream, you code didn’t work because you are invoking your sieve method recursively instead of lazily. So you had an infinity recursion before you could query your resulting stream for the first value.

Splitting an IntStream s into a head and a tail IntStream (which has not yet consumed) is possible:

PrimitiveIterator.OfInt it = s.iterator();
int head = it.nextInt();
IntStream tail = IntStream.generate(it::next).filter(i -> i % head != 0);

At this place you need a construct of invoking sieve on the tail lazily. Stream does not provide that; concat expects existing stream instances as arguments and you can’t construct a stream invoking sieve lazily with a lambda expression as lazy creation works with mutable state only which lambda expressions do not support. If you don’t have a library implementation hiding the mutable state you have to use a mutable object. But once you accept the requirement of mutable state, the solution can be even easier than your first approach:

IntStream primes = from(2).filter(i -> p.test(i)).peek(i -> p = p.and(v -> v % i != 0));

IntPredicate p = x -> true;

IntStream from(int n)
{
  return IntStream.iterate(n, m -> m + 1);
}

This will recursively create a filter but in the end it doesn’t matter whether you create a tree of IntPredicates or a tree of IntStreams (like with your IntStream.concat approach if it did work). If you don’t like the mutable instance field for the filter you can hide it in an inner class (but not in a lambda expression…).

Pastelki answered 15/11, 2013 at 17:40 Comment(3)
I don't think this works, you can't refer to non-final variables inside lambda expressions.Millicent
@djjeck: if you refer to the second solution, these are fields, not local variables, and can be modified. The variables of the first example are effectively final. And both examples are tested and proven.Pastelki
You're right, that works (and you actually mention it in the answer). Thanks for clarifying.Millicent
T
4

My StreamEx library has now headTail() operation which solves the problem:

public static StreamEx<Integer> sieve(StreamEx<Integer> input) {
    return input.headTail((head, tail) -> 
        sieve(tail.filter(n -> n % head != 0)).prepend(head));
}

The headTail method takes a BiFunction which will be executed at most once during the stream terminal operation execution. So this implementation is lazy: it does not compute anything until traversal starts and computes only as much prime numbers as requested. The BiFunction receives a first stream element head and the stream of the rest elements tail and can modify the tail in any way it wants. You may use it with predefined input:

sieve(IntStreamEx.range(2, 1000).boxed()).forEach(System.out::println);

But infinite stream work as well

sieve(StreamEx.iterate(2, x -> x+1)).takeWhile(x -> x < 1000)
     .forEach(System.out::println);
// Not the primes till 1000, but 1000 first primes
sieve(StreamEx.iterate(2, x -> x+1)).limit(1000).forEach(System.out::println);

There's also alternative solution using headTail and predicate concatenation:

public static StreamEx<Integer> sieve(StreamEx<Integer> input, IntPredicate isPrime) {
    return input.headTail((head, tail) -> isPrime.test(head) 
            ? sieve(tail, isPrime.and(n -> n % head != 0)).prepend(head)
            : sieve(tail, isPrime));
}

sieve(StreamEx.iterate(2, x -> x+1), i -> true).limit(1000).forEach(System.out::println);

It interesting to compare recursive solutions: how many primes they capable to generate.

@John McClean solution (StreamUtils)

John McClean solutions are not lazy: you cannot feed them with infinite stream. So I just found by trial-and-error the maximal allowed upper bound (17793) (after that StackOverflowError occurs):

public void sieveTest(){
    sieve(IntStream.range(2, 17793).boxed()).forEach(System.out::println);
}

@John McClean solution (Streamable)

public void sieveTest2(){
    sieve(Streamable.range(2, 39990)).forEach(System.out::println);
}

Increasing upper limit above 39990 results in StackOverflowError.

@frhack solution (LazySeq)

LazySeq<Integer> ints = integers(2);
LazySeq primes = sieve(ints); // sieve method from @frhack answer
primes.forEach(p -> System.out.println(p));

Result: stuck after prime number = 53327 with enormous heap allocation and garbage collection taking more than 90%. It took several minutes to advance from 53323 to 53327, so waiting more seems impractical.

@vidi solution

Prime.stream().forEach(System.out::println);

Result: StackOverflowError after prime number = 134417.

My solution (StreamEx)

sieve(StreamEx.iterate(2, x -> x+1)).forEach(System.out::println);

Result: StackOverflowError after prime number = 236167.

@frhack solution (rxjava)

Observable<Integer> primes = Observable.from(()->primesStream.iterator());
primes.forEach((x) -> System.out.println(x.toString()));            

Result: StackOverflowError after prime number = 367663.

@Holger solution

IntStream primes=from(2).filter(i->p.test(i)).peek(i->p=p.and(v->v%i!=0));
primes.forEach(System.out::println);

Result: StackOverflowError after prime number = 368089.

My solution (StreamEx with predicate concatenation)

sieve(StreamEx.iterate(2, x -> x+1), i -> true).forEach(System.out::println);

Result: StackOverflowError after prime number = 368287.


So three solutions involving predicate concatenation win, because each new condition adds only 2 more stack frames. I think, the difference between them is marginal and should not be considered to define a winner. However I like my first StreamEx solution more as it more similar to Scala code.

Trimmer answered 19/1, 2016 at 10:33 Comment(2)
Could you please add timing comparison of respective solutions? Since they all involve predicate contatenation (just more or less explicit) and the modulo operation, they might score equally, though.Sleek
Hey man, great work with this library. We use it fairly extensively. The MoreCollectors.flatMapping was really useful, now it is in Java 9. A lot of things you did here could actually be candidates to the main JDK baseline 10+Parlour
J
3

The solution below does not do state mutations, except for the head/tail deconstruction of the stream.

The lazyness is obtained using IntStream.iterate. The class Prime is used to keep the generator state

    import java.util.PrimitiveIterator;
    import java.util.stream.IntStream;
    import java.util.stream.Stream;

    public class Prime {
        private final IntStream candidates;
        private final int current;

        private Prime(int current, IntStream candidates)
        {
            this.current = current;
            this.candidates = candidates;
        }

        private Prime next()
        {
            PrimitiveIterator.OfInt it = candidates.filter(n -> n % current != 0).iterator();

            int head = it.next();
            IntStream tail = IntStream.generate(it::next);

            return new Prime(head, tail);
        }

        public static Stream<Integer> stream() {
            IntStream possiblePrimes = IntStream.iterate(3, i -> i + 1);

            return Stream.iterate(new Prime(2, possiblePrimes), Prime::next)
                         .map(p -> p.current);
        }
    }

The usage would be this:

Stream<Integer> first10Primes = Prime.stream().limit(10)
Jagir answered 17/9, 2015 at 15:20 Comment(1)
It would be great to avoid the modulo operation and take advantage of the additive nature of the sieve. Something along the lines of IntStream impossiblePrimes = IntStream.iterate(head * head, i -> i + 2 * head);. But sure, it's more primes-related than head/tail-related.Sleek
O
2

You can essentially implement it like this:

static <T> Tuple2<Optional<T>, Seq<T>> splitAtHead(Stream<T> stream) {
    Iterator<T> it = stream.iterator();
    return tuple(it.hasNext() ? Optional.of(it.next()) : Optional.empty(), seq(it));
}

In the above example, Tuple2 and Seq are types borrowed from jOOλ, a library that we developed for jOOQ integration tests. If you don't want any additional dependencies, you might as well implement them yourself:

class Tuple2<T1, T2> {
    final T1 v1;
    final T2 v2;

    Tuple2(T1 v1, T2 v2) {
        this.v1 = v1;
        this.v2 = v2;
    }

    static <T1, T2> Tuple2<T1, T2> tuple(T1 v1, T2 v2) {
        return new Tuple<>(v1, v2);
    }
}

static <T> Tuple2<Optional<T>, Stream<T>> splitAtHead(Stream<T> stream) {
    Iterator<T> it = stream.iterator();
    return tuple(
        it.hasNext() ? Optional.of(it.next()) : Optional.empty,
        StreamSupport.stream(Spliterators.spliteratorUnknownSize(
            it, Spliterator.ORDERED
        ), false)
    );
}
Oxonian answered 22/9, 2014 at 14:28 Comment(0)
H
1

If you don't mind using 3rd party libraries cyclops-streams, I library I wrote has a number of potential solutions.

The StreamUtils class has large number of static methods for working directly with java.util.stream.Streams including headAndTail.

HeadAndTail<Integer> headAndTail = StreamUtils.headAndTail(Stream.of(1,2,3,4));
int head = headAndTail.head(); //1
Stream<Integer> tail = headAndTail.tail(); //Stream[2,3,4]

The Streamable class represents a replayable Stream and works by building a lazy, caching intermediate data-structure. Because it is caching and repayable - head and tail can be implemented directly and separately.

Streamable<Integer> replayable=  Streamable.fromStream(Stream.of(1,2,3,4));
int head = repayable.head(); //1
Stream<Integer> tail = replayable.tail(); //Stream[2,3,4]

cyclops-streams also provides a sequential Stream extension that in turn extends jOOλ and has both Tuple based (from jOOλ) and domain object (HeadAndTail) solutions for head and tail extraction.

SequenceM.of(1,2,3,4)
         .splitAtHead(); //Tuple[1,SequenceM[2,3,4]

SequenceM.of(1,2,3,4)
         .headAndTail();

Update per Tagir's request -> A Java version of the Scala sieve using SequenceM

public void sieveTest(){
    sieve(SequenceM.range(2, 1_000)).forEach(System.out::println);
}

SequenceM<Integer> sieve(SequenceM<Integer> s){

    return s.headAndTailOptional().map(ht ->SequenceM.of(ht.head())
                            .appendStream(sieve(ht.tail().filter(n -> n % ht.head() != 0))))
                    .orElse(SequenceM.of());
}

And another version via Streamable

public void sieveTest2(){
    sieve(Streamable.range(2, 1_000)).forEach(System.out::println);
}

Streamable<Integer> sieve(Streamable<Integer> s){

    return s.size()==0? Streamable.of() : Streamable.of(s.head())
                                                    .appendStreamable(sieve(s.tail()
                                                                    .filter(n -> n % s.head() != 0)));
}

Note - neither Streamable of SequenceM have an Empty implementation - hence the size check for Streamable and the use of headAndTailOptional.

Finally a version using plain java.util.stream.Stream

import static com.aol.cyclops.streams.StreamUtils.headAndTailOptional;

public void sieveTest(){
    sieve(IntStream.range(2, 1_000).boxed()).forEach(System.out::println);
}

Stream<Integer> sieve(Stream<Integer> s){

    return headAndTailOptional(s).map(ht ->Stream.concat(Stream.of(ht.head())
                            ,sieve(ht.tail().filter(n -> n % ht.head() != 0))))
                    .orElse(Stream.of());
}

Another update - a lazy iterative based on @Holger's version using objects rather than primitives (note a primitive version is also possible)

  final Mutable<Predicate<Integer>> predicate = Mutable.of(x->true);
  SequenceM.iterate(2, n->n+1)
           .filter(i->predicate.get().test(i))
           .peek(i->predicate.mutate(p-> p.and(v -> v%i!=0)))
           .limit(100000)
           .forEach(System.out::println);
Hornsby answered 11/1, 2016 at 6:0 Comment(6)
It would be nice to see the full solution of primes problem using your libraries.Trimmer
Teammate going to play around with implementing a sieve of Eratosthenes with Streamable / SequenceM - will keep you posted.Hornsby
final List<Integer> prime = new ArrayList<>(); SequenceM<Integer> primeNumbers = SequenceM.generate(() -> { synchronized (prime) { if (prime.isEmpty()) { prime.add(2); } else { int last = prime.get(prime.size() - 1); do { int candidate = last; if (!prime.stream().parallel().anyMatch(c -> (candidate) % c == 0)) { prime.add(candidate); break; } last++; } while (true); } return prime.get(prime.size() - 1); } });Enrapture
Looks that both solutions are not lazy though. Try sieve(Stream.iterate(2, x -> x+1)).limit(1000) as the source, for example.Trimmer
Yes, the head is evaluated immediately each time, the tail is processed lazily. It's a pretty trivial modification to make the head() method in HeadAndTail evaluate lazily however and one that we will add soon.Hornsby
You probably mean the solution provided isn't Lazy? Yes that's true, I'll update with a Lazy version (the question did specify a pretty low limit originally though).Hornsby
C
1

There are many interesting suggestions provided here, but if someone needs a solution without dependencies to third party libraries I came up with this:

    import java.util.AbstractMap;
    import java.util.Optional;
    import java.util.Spliterators;
    import java.util.stream.StreamSupport;

    /**
     * Splits a stream in the head element and a tail stream.
     * Parallel streams are not supported.
     * 
     * @param stream Stream to split.
     * @param <T> Type of the input stream.
     * @return A map entry where {@link Map.Entry#getKey()} contains an
     *    optional with the first element (head) of the original stream
     *    and {@link Map.Entry#getValue()} the tail of the original stream.
     * @throws IllegalArgumentException for parallel streams.
     */
    public static <T> Map.Entry<Optional<T>, Stream<T>> headAndTail(final Stream<T> stream) {
        if (stream.isParallel()) {
            throw new IllegalArgumentException("parallel streams are not supported");
        }
        final Iterator<T> iterator = stream.iterator();
        return new AbstractMap.SimpleImmutableEntry<>(
                iterator.hasNext() ? Optional.of(iterator.next()) : Optional.empty(),
                StreamSupport.stream(Spliterators.spliteratorUnknownSize(iterator, 0), false)
        );
    }
Cymar answered 11/11, 2018 at 12:22 Comment(0)
S
0

To get head and tail you need a Lazy Stream implementation. Java 8 stream or RxJava are not suitable.

You can use for example LazySeq as follows.

Lazy sequence is always traversed from the beginning using very cheap first/rest decomposition (head() and tail())

LazySeq implements java.util.List interface, thus can be used in variety of places. Moreover it also implements Java 8 enhancements to collections, namely streams and collectors


package com.company;

import com.nurkiewicz.lazyseq.LazySeq;

public class Main {

    public static void main(String[] args) {

        LazySeq<Integer> ints = integers(2);
        LazySeq primes = sieve(ints);
        primes.take(10).forEach(p -> System.out.println(p));

    }

    private static LazySeq<Integer> sieve(LazySeq<Integer> s) {
        return LazySeq.cons(s.head(), () -> sieve(s.filter(x -> x % s.head() != 0)));
    }

    private static LazySeq<Integer> integers(int from) {
        return LazySeq.cons(from, () -> integers(from + 1));
    }

}
Semanteme answered 6/6, 2015 at 10:23 Comment(0)
S
0

Here is another recipe using the way suggested by Holger. It use RxJava just to add the possibility to use the take(int) method and many others.

package com.company;

import rx.Observable;

import java.util.function.IntPredicate;
import java.util.stream.IntStream;

public class Main {

    public static void main(String[] args) {

        final IntPredicate[] p={(x)->true};
        IntStream primesStream=IntStream.iterate(2,n->n+1).filter(i -> p[0].test(i)).peek(i->p[0]=p[0].and(v->v%i!=0)   );

        Observable primes = Observable.from(()->primesStream.iterator());

        primes.take(10).forEach((x) -> System.out.println(x.toString()));


    }

}
Semanteme answered 6/6, 2015 at 11:11 Comment(0)
E
0

This should work with parallel streams as well:

public static <T> Map.Entry<Optional<T>, Stream<T>> headAndTail(final Stream<T> stream) {
    final AtomicReference<Optional<T>> head = new AtomicReference<>(Optional.empty());
    final var spliterator = stream.spliterator();
    spliterator.tryAdvance(x -> head.set(Optional.of(x)));
    return Map.entry(head.get(), StreamSupport.stream(spliterator, stream.isParallel()));
}
Evanston answered 2/7, 2021 at 9:18 Comment(0)
C
-3

If you want to get head of a stream, just:

IntStream.range(1, 5).first();

If you want to get tail of a stream, just:

IntStream.range(1, 5).skip(1);

If you want to get both head and tail of a stream, just:

IntStream s = IntStream.range(1, 5);
int head = s.head();
IntStream tail = s.tail();

If you want to find the prime, just:

LongStream.range(2, n)
   .filter(i -> LongStream.range(2, (long) Math.sqrt(i) + 1).noneMatch(j -> i % j == 0))
   .forEach(N::println);

If you want to know more, go to get abacus-common

Declaration: I'm the developer of abacus-common.

Copland answered 1/12, 2016 at 19:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.