Should I return a Collection or a Stream?

Asked 10/7, 2014 at 12:42 Answered 4/1 at 12:17

Solved java collections java-8 encapsulation java-stream

189

Suppose I have a method that returns a read-only view into a member list:

class Team {
    private List<Player> players = new ArrayList<>();

    // ...

    public List<Player> getPlayers() {
        return Collections.unmodifiableList(players);
    }
}

Further suppose that all the client does is iterate over the list once, immediately. Maybe to put the players into a JList or something. The client does not store a reference to the list for later inspection!

Given this common scenario, should I return a stream instead?

public Stream<Player> getPlayers() {
    return players.stream();
}

Or is returning a stream non-idiomatic in Java? Were streams designed to always be "terminated" inside the same expression they were created in?

Eberhardt answered 10/7, 2014 at 12:42 Comment(2)

There is definitely nothing wrong with this as an idiom. After all, players.stream() is just such a method which returns a stream to the caller. The real question is, do you really want to constrain the caller to single traversal, and also deny him the access to your collection over the Collection API? Maybe the caller just wants to addAll it to another collection? – Merozoite 10/7, 2014 at 13:52

It all depends. You can always do collection.stream() as well as Stream.collect(). So its up to you and the caller who uses that function. – Van 4/7, 2017 at 7:23

262

The answer is, as always, "it depends". It depends on how big the returned collection will be. It depends on whether the result changes over time, and how important consistency of the returned result is. And it depends very much on how the user is likely to use the answer.

First, note that you can always get a Collection from a Stream, and vice versa:

// If API returns Collection, convert with stream()
getFoo().stream()...

// If API returns Stream, use collect()
Collection<T> c = getFooStream().collect(toList());

So the question is, which is more useful to your callers.

If your result might be infinite, there's only one choice: Stream.

If your result might be very large, you probably prefer Stream, since there may not be any value in materializing it all at once, and doing so could create significant heap pressure.

If all the caller is going to do is iterate through it (search, filter, aggregate), you should prefer Stream, since Stream has these built-in already and there's no need to materialize a collection (especially if the user might not process the whole result.) This is a very common case.

Even if you know that the user will iterate it multiple times or otherwise keep it around, you still may want to return a Stream instead, for the simple fact that whatever Collection you choose to put it in (e.g., ArrayList) may not be the form they want, and then the caller has to copy it anyway. If you return a Stream, they can do collect(toCollection(factory)) and get it in exactly the form they want.

The above "prefer Stream" cases mostly derive from the fact that Stream is more flexible; you can late-bind to how you use it without incurring the costs and constraints of materializing it to a Collection.

The one case where you must return a Collection is when there are strong consistency requirements, and you have to produce a consistent snapshot of a moving target. Then, you will want put the elements into a collection that will not change.

So I would say that most of the time, Stream is the right answer — it is more flexible, it doesn't impose usually-unnecessary materialization costs, and can be easily turned into the Collection of your choice if needed. But sometimes, you may have to return a Collection (say, due to strong consistency requirements), or you may want to return Collection because you know how the user will be using it and know this is the most convenient thing for them.

If you already have a suitable Collection "lying around", and it seems likely that your users would rather interact with it as a Collection, then it is a reasonable choice (though not the only one, and more brittle) to just return what you have.

Enrollment answered 10/7, 2014 at 14:51 Comment(25)

Okay, so there is nothing wrong per se with returning a Stream? By the way, I love your lambda-related work, especially the talk about their implementation under the hood :) – Eberhardt 10/7, 2014 at 16:11

Like I said, there are a few cases where it won't fly, such as those when you want to return a snapshot in time of a moving target, especially when you have strong consistency requirements. But most of the time, Stream seems the more general choice, unless you know something specific about how it will be used. – Enrollment 10/7, 2014 at 16:24

If, as the question states, the goal is to return a read-only view of a collection already present on the heap, then none of the advantages of Stream apply, and if the caller needs Stream-provided services, it's just one call away. So for this particular question, there isn't much going for a Stream-typed getter. – Merozoite 10/7, 2014 at 17:27

@Marko Even if you confine your question so narrowly, I still disagree with your conclusion. Perhaps you are assuming that creating a Stream is somehow much more expensive than wrapping the collection with an immutable wrapper? (And, even if you don't, the stream view you get on the wrapper is worse than what you get off the original; because UnmodifiableList doesn't override spliterator(), you effectively will lose all parallelism.) Bottom line: beware of familiarity bias; you've known Collection for years, and that might make you distrust the newcomer. – Enrollment 10/7, 2014 at 17:54

You raise a strong point with losing the original spliterator: I would have intuitively assumed that the immutable wrapper exposes the underlying collection's spliterator, but I bet there's a subtle catch which precludes that. Regarding expensiveness, I am assuming that going from collection to stream is cheap, but the reverse isn't (it's O(n) in both time and space). Thirdly, you are obviously giving a very educational and widely useful answer here, but I still think some space must be devoted to addressing OP's specifics. – Merozoite 10/7, 2014 at 18:10

@MarkoTopolnik Sure. My goal was to address the general API design question, which is becoming a FAQ. Regarding cost, note that, if you don't already have a materialized collection you can return or wrap (OP does, but often there is not one), materializing a collection in the getter method is not any cheaper than returning a stream and letting the caller materialize one (and of course early materialization might be much more expensive, if the caller doesn't need it or if you return ArrayList but caller wants TreeSet.) But Stream is new, and people often assume its more $$$ than it is. – Enrollment 10/7, 2014 at 18:14

That's interesting: maybe this is yet another case of my bias because I have for years lived in the world of laziness (Clojure) and have often taken great pains to transplant it to Java. Not only am I aware of the advantages of laziness, I take them for granted and refuse to live without them; but I also have no difficulty discerning the cases where laziness doesn't buy anything. So it could be that I am underestimating the strong need of preaching laziness to Java folks before getting down to the details of when it's not so useful. – Merozoite 10/7, 2014 at 18:29

Regarding non-materialized streams, I was actually complaining a few months ago on lambda-dev about the current API's weak support for their parallelization and got the impression from you that Streams API's focus was on processing in-memory, random-access structures---such which lend themselves to easy top-down splitting. But obviously, there are many more facets to your team's treatment of this issue. – Merozoite 10/7, 2014 at 18:34

@MarkoTopolnik While in-memory is a very important use-case, there are also some other cases that have good parallelization support, such as non-ordered generated streams (e.g., Stream.generate). However, where Streams is a poor fit is the reactive use case, where the data arrives with random latency. For that, I would suggest RxJava. – Enrollment 10/7, 2014 at 18:38

Maybe we disagree here, but this limitation is a major point of consideration. In my experience with business software, the leading example of lazily materialized streams are those backed by I/O, which means they fall into the "random latency" category. Streams backed by I/O are also usually the most critical target for parallelization because they are often huge and require heavyweight processing. All this means that the most critical area of application of parallel lazy streams is exactly the area which does not enjoy support from the Streams API. – Merozoite 10/7, 2014 at 19:57

@MarkoTopolnik I don't think we disagree, except perhaps that you might have liked us to focus our efforts slightly differently. (We're used to this; can't make all the people happy.) The design center for Streams focused on in-memory data structures; the design center for RxJava focuses on externally-generated events. Both are good libraries; also both do not fare very well when you try to apply them to cases well out of their design center. But just because a hammer is a terrible tool for needlepoint, that doesn't suggest there is anything wrong with the hammer. – Enrollment 10/7, 2014 at 20:7

Actually, this is not necessarily about my wishes; it is about the incongruence between promoting laziness and parallelization on one side and not supporting its most important use case on the other. I must add, however, that I already have positive experience with the parallelization of I/O-backed streams using Streams API. I had no trouble saturating all the CPU cores. The major issues I faced concern the unusual pattern of processing, which happens at the very end of the request-processing pipeline. This caused issues with transactions and error handling, but Streams API was not to blame. – Merozoite 10/7, 2014 at 20:10

So I wonder: did I just get lucky, and there are major issues lurking in my code, or is this goal not so far out of reach for Streams API? All I had to add was a fixed, but configurable, batch-size splitting policy instead of the default, nonconfigurable arithmetic-progression one. – Merozoite 10/7, 2014 at 20:13

@MarkoTopolnik I think you probably got a little lucky and found something that was "close enough" to work. There are definitely things we could do to improve it, but eventually you run into mismatches that are too painful to try and resolve. Rx is great; use it for what its good at. Streams is great; use it for what its good at. – Enrollment 10/7, 2014 at 20:34

What bothers me with Stream<> approach, is that in general it is unknown if the stream getter call should be wrapped into try-with-resources or not. – Utilitarianism 27/1, 2015 at 9:50

@AskarKalykov That's only true if the stream() method doesn't have a proper specification! Fortunately we have tools for methods to declare things like this, it's just that a) people don't write docs and b) people don't read them. But that's not a problem with the approach... – Enrollment 27/1, 2015 at 17:2

@BrianGoetz So in situation when you combine different streams (origins of which are unknown in design-time) into a resulting one, you should always propagate (transitively) try-with-resources approach? And if origins are known in design time (all of them are effectively not closeable), but situation will possibly change in a month or two, when you will incorporate closeable stream, then you should refactor (also transitively) all the usages to try-with-resources? – Utilitarianism 27/1, 2015 at 19:3

@BrianGoetz: Should the above code getFooStream().collect(toList()) read getFooStream().collect(Collectors.toList())? – Superdreadnought 29/5, 2015 at 7:19

@Superdreadnought It could, but I usually static-import Collectors.*. – Enrollment 29/5, 2015 at 12:23

Agree with all the reasons given why 'streams are probably better'. However, there's one big reason why I still prefer collections most of the time. Streams make my code very hard to debug. Try stepping it with a debugger... you really can't. So that alone, to me is usually a more than good enough reason to use collections by default. And then... only deliberately choose to use streams if I have a good enough and compelling reason to do so, in that specific case. – Urger 16/4, 2018 at 22:43

I am a big fan of the stream interface, and I use it whenever possible. However, a reason that makes me prefer immutable collections as return values is that Stream<T> does not implement Iterable<T> so using it in a foreach loop is cumbersome, as you have to: (Iterable<T>)stream::iterator. Since it is easier to convert a collection to a stream whenever needed, I usually end up returning immutable views. – Training 17/12, 2018 at 11:12

toList() is not recognized, and if I do Collectors.toList() I get a type error because it seems toList() works only for List<String> (?) What is a generic way to achieve toList()? – Datnow 8/11, 2019 at 12:56

Since streams only allow one terminal operation, isn't it a little dangerous to return them? The caller could use the returned stream to create two new ones by e.g. calling theStream.map(...) twice. It doesn't fail until runtime. Depending upon test coverage and how the stream references are passed around, that could be nasty. – Scab 14/10, 2020 at 14:45

@AntKutschera In reality, no. Methods that return Stream always return a fresh, unshared stream; this is easy to ensure as it makes very little sense for streams to be shared. It is the responsibility of whoever acquires the stream (whether they created it themselves, or got it from another method) to use it once. In general, this is also not very hard to ensure, as long as you understand the most basic facts about how streams work. – Enrollment 14/10, 2020 at 15:26

"when there are strong consistency requirements, and you have to produce a consistent snapshot of a moving target.", could someone give an example? – Gipsy 20/11, 2020 at 21:14

I have a few points to add to Brian Goetz' excellent answer.

It's quite common to return a Stream from a "getter" style method call. See the Stream usage page in the Java 8 javadoc and look for "methods... that return Stream" for the packages other than java.util.Stream. These methods are usually on classes that represent or can contain multiple values or aggregations of something. In such cases, APIs typically have returned collections or arrays of them. For all the reasons that Brian noted in his answer, it's very flexible to add Stream-returning methods here. Many of these classes have collections- or array-returning methods already, because the classes predate the Streams API. If you're designing a new API, and it makes sense to provide Stream-returning methods, it might not be necessary to add collection-returning methods as well.

Brian mentioned the cost of "materializing" the values into a collection. To amplify this point, there are actually two costs here: the cost of storing values in the collection (memory allocation and copying) and also the cost of creating the values in the first place. The latter cost can often be reduced or avoided by taking advantage of a Stream's laziness-seeking behavior. A good example of this are the APIs in java.nio.file.Files:

static Stream<String>  lines(path)
static List<String>    readAllLines(path)

Not only does readAllLines have to hold the entire file contents in memory in order to store it into the result list, it also has to read the file to the very end before it returns the list. The lines method can return almost immediately after it has performed some setup, leaving file reading and line breaking until later when it's necessary -- or not at all. This is a huge benefit, if for example, the caller is interested only in the first ten lines:

try (Stream<String> lines = Files.lines(path)) {
    List<String> firstTen = lines.limit(10).collect(toList());
}

Of course considerable memory space can be saved if the caller filters the stream to return only lines matching a pattern, etc.

An idiom that seems to be emerging is to name stream-returning methods after the plural of the name of the things that it represents or contains, without a get prefix. Also, while stream() is a reasonable name for a stream-returning method when there is only one possible set of values to be returned, sometimes there are classes that have aggregations of multiple types of values. For example, suppose you have some object that contains both attributes and elements. You might provide two stream-returning APIs:

Stream<Attribute>  attributes();
Stream<Element>    elements();

Adiaphorous answered 10/7, 2014 at 16:37 Comment(3)

Great points. Can you say more about where you're seeing that naming idiom arising, and how much traction (steam?) it's picking up? I like the idea of a naming convention making it obvious that you're getting a stream vs a collection — though I also often expect IDE completion on "get" to tell me what I can get. – Astolat 1/2, 2016 at 15:59

I am also very interested about that naming idiom – Afterimage 13/7, 2016 at 16:47

@JoshuaGoldberg The JDK seems to have adopted this naming idiom, though not exclusively. Consider: CharSequence.chars() and .codePoints(), BufferedReader.lines(), and Files.lines() existed in Java 8. In Java 9, the following have been added: Process.children(), NetworkInterface.addresses(), Scanner.tokens(), Matcher.results(), java.xml.catalog.Catalog.catalogs(). Other stream-returning methods have been added that don't use this idiom -- Scanner.findAll() comes to mind -- but the plural noun idiom seems to have come into fair use in the JDK. – Adiaphorous 14/7, 2016 at 1:9

While some of the more high-profile respondents gave great general advice, I'm surprised no one has quite stated:

If you already have a "materialized" Collection in-hand (i.e. it was already created before the call - as is the case in the given example, where it is a member field), there is no point converting it to a Stream. The caller can easily do that themselves. Whereas, if the caller wants to consume the data in its original form, you converting it to a Stream forces them to do redundant work to re-materialize a copy of the original structure.

Angry answered 17/5, 2020 at 0:16 Comment(2)

Nearly everything about this answer belies questionable assumptions. Returning the collection, unless it is already read-only or you wrap it with a read-only view, means that the caller can mutate the collection out from under you, whereas a stream is a read-only view. You seem to think "converting" it to a stream is expensive; it is not; it is no more expensive than wrapping in a read-only view. You also seem to assume that the caller always needs to rematerialize it; this is rarely the case. (And when they do, you have no guarantee they want it in the same form you have it.) – Enrollment 24/8, 2021 at 18:25

Thanks for the comment. You're totally right that I generally assume we'll wrap in unmodifiable, and I didn't state that. I don't think procuring a stream is expensive; I just think dropping the capabilities of the original collection in favor of a stream may not be the best default choice. Returning streams (when there is already a materialized collection) retains more implementation flexibility, at the cost of requiring redundant work + space from the caller IF they wanted the original collection. And I do assume that is not-quite-rarely the case, which could be wrong of me. Readers, ymmv. – Angry 25/8, 2021 at 21:48

Were streams designed to always be "terminated" inside the same expression they were created in?

That is how they are used in most examples.

Note: returning a Stream is not that different to returning a Iterator (admitted with much more expressive power)

IMHO the best solution is to encapsulate why you are doing this, and not return the collection.

e.g.

public int playerCount();
public Player player(int n);

or if you intend to count them

public int countPlayersWho(Predicate<? super Player> test);

Amagasaki answered 10/7, 2014 at 13:13 Comment(2)

The problem with this answer is it would require the author to anticipate every action the client wants to do an it would greatly increase the number of methods on the class. – Ursa 10/7, 2014 at 14:37

@Ursa It depends on whether the end users is the author or someone they work with. If the end users are unknowable, then you need a more general solution. You might still want to limit access to the underlying collection. – Amagasaki 10/7, 2014 at 18:41

If the stream is finite, and there is an expected/normal operation on the returned objects which will throw a checked exception, I always return a Collection. Because if you are going to be doing something on each of the objects that can throw a check exception, you will hate the stream. One real lack with streams i there inability to deal with checked exceptions elegantly.

Now, perhaps that is a sign that you don't need the checked exceptions, which is fair, but sometimes they are unavoidable.

Messick answered 17/4, 2018 at 16:53 Comment(0)

In contrast to collections, streams have additional characteristics. A stream returned by any method might be:

finite or infinite
parallel or sequential (with a default globally shared threadpool that can impact any other part of an application)
ordered or non-ordered
holding references to be closed or not

These differences also exists in collections, but there they are part of the obvious contract:

All Collections have size, Iterator/Iterable can be infinite.
Collections are explicitly ordered or non-ordered
Parallelity is thankfully not something the collection care about beyond thread-safety
Collections also are not closable typically, so also no need to worry about using try-with-resources as a guard.

As a consumer of a stream (either from a method return or as a method parameter) this is a dangerous and confusing situation. To make sure their algorithm behaves correctly, consumers of streams need to make sure the algorithm makes no wrong assumption about the stream characteristics. And that is a very hard thing to do. In unit testing, that would mean that you have to multiply all your tests to be repeated with the same stream contents, but with streams that are

(finite, ordered, sequential, requiring-close)
(finite, ordered, parallel, requiring-close)
(finite, non-ordered, sequential, requiring-close)...

Writing method guards for streams that throw an IllegalArgumentException if the input stream has a characteristics breaking your algorithm is difficult, because the properties are hidden.

Documentation mitigates the problem, but it is flawed and often overlooked, and does not help when a stream provider is modified. As an example, see these javadocs of Java8 Files:

 /**
  * [...] The returned stream encapsulates a Reader. If timely disposal of
  * file system resources is required, the try-with-resources 
  * construct should be used to ensure that the stream's close 
  * method is invoked after the stream operations are completed.
  */
 public static Stream<String> lines(Path path, Charset cs)
 /**
  * [...] no mention of closing even if this wraps the previous method
  */
public static Stream<String> lines(Path path)

That leaves Stream only as a valid choice in a method signature when none of the problems above matter, typically when the stream producer and consumer are in the same codebase, and all consumers are known (e.g. not part of the public interface of a class reusable in many places).

It is much safer to use other datatypes in method signatures with an explicit contract (and without implicit thread-pool processing involved) that makes it impossible to accidentally process data with wrong assumptions about orderedness, sizedness or parallelity (and threadpool usage).

Decosta answered 22/4, 2018 at 1:56 Comment(1)

Your concerns about infinite streams are unfounded; the question is "should I return a collection or a stream". If Collection is a possibility, the result is by definition finite. So worries that callers would risk an infinite iteration, given that you could have returned a collection, are unfounded. The rest of the advice in this answer is merely bad. It sounds to me like you ran into someone that over-used Stream, and you're over-rotating in the other direction. Understandable, but bad advice. – Enrollment 26/4, 2019 at 21:56

I think it depends on your scenario. May be, if you make your Team implement Iterable<Player>, it is sufficient.

for (Player player : team) {
    System.out.println(player);
}

or in the a functional style:

team.forEach(System.out::println);

But if you want a more complete and fluent api, a stream could be a good solution.

Savadove answered 10/7, 2014 at 13:38 Comment(1)

Note that, in the code the OP posted, the player count is almost useless, other than as an estimate ('1034 players playing now, click here to start!') This is because you're returning an immutable view of a mutable collection, so the count you get now may not equal the count three microseconds from now. So while returning a Collection gives you an "easy" way to get to the count (and really, stream.count() is pretty easy too), that number is not really very meaningful for anything other than debugging or estimating. – Enrollment 10/7, 2014 at 15:15

-1

If you want to return with stream then import following path:

import static java.util.Arrays.stream;

Raycher answered 4/1 at 12:17 Comment(0)

-2

Perhaps a Stream factory would be a better choice. The big win of only exposing collections via Stream is that it better encapsulates your domain model’s data structure. It’s impossible for any use of your domain classes to affect the inner workings of your List or Set simply by exposing a Stream.

It also encourages users of your domain class to write code in a more modern Java 8 style. It’s possible to incrementally refactor to this style by keeping your existing getters and adding new Stream-returning getters. Over time, you can rewrite your legacy code until you’ve finally deleted all getters that return a List or Set. This kind of refactoring feels really good once you’ve cleared out all the legacy code!

Ocasio answered 15/2, 2017 at 8:3 Comment(1)

is there a reason this is fully quoted? is there a source? – Blackmore 7/6, 2017 at 20:40

-5

I would probably have 2 methods, one to return a Collection and one to return the collection as a Stream.

class Team
{
    private List<Player> players = new ArrayList<>();

// ...

    public List<Player> getPlayers()
    {
        return Collections.unmodifiableList(players);
    }

    public Stream<Player> getPlayerStream()
    {
        return players.stream();
    }

}

This is the best of both worlds. The client can choose if they want the List or the Stream and they don't have to do the extra object creation of making an immutable copy of the list just to get a Stream.

This also only adds 1 more method to your API so you don't have too many methods

Ursa answered 10/7, 2014 at 14:39 Comment(2)

Because he wanted to choose between these two options and asked the pros and cons of each one. Moreover it provides everyone with a better understanding of these concepts. – Bankhead 10/7, 2014 at 20:20

Please don't do that. Imagine the APIs! – Hoebart 8/11, 2016 at 15:50

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags