In Java streams is peek really only for debugging?

M

10

220

I'm reading up about Java streams and discovering new things as I go along. One of the new things I found was the peek() function. Almost everything I've read on peek says it should be used to debug your Streams.

What if I had a Stream where each Account has a username, password field and a login() and loggedIn() method.

I also have

Consumer<Account> login = account -> account.login();

and

Predicate<Account> loggedIn = account -> account.loggedIn();

Why would this be so bad?

List<Account> accounts; //assume it's been setup
List<Account> loggedInAccount = 
accounts.stream()
    .peek(login)
    .filter(loggedIn)
    .collect(Collectors.toList());

Now as far as I can tell this does exactly what it's intended to do. It;

Takes a list of accounts
Tries to log in to each account
Filters out any account which aren't logged in
Collects the logged in accounts into a new list

What is the downside of doing something like this? Any reason I shouldn't proceed? Lastly, if not this solution then what?

The original version of this used the .filter() method as follows;

.filter(account -> {
        account.login();
        return account.loggedIn();
    })

Mancini answered 10/11, 2015 at 17:12 Comment(11)

Any time I find myself needing a multi-line lambda, I move the lines to a private method and pass the method reference instead of the lambda. – Tephra 10/11, 2015 at 17:18

Yeah I understand this. I was just trying to more clearly demonstrate what I'm trying to achieve. Thanks though :) – Mancini 10/11, 2015 at 17:35

What's the intent - are you trying to log all accounts in and filter them based on if they're logged in (which may be trivially true)? Or, do you want to log them in, then filter them based on whether or not they've logged in? I'm asking this in this order because forEach may be the operation you want as opposed to peek. Just because it's in the API doesn't mean it's not open for abuse (like Optional.of). – Discommend 10/11, 2015 at 17:43

Filter based in if they have actually logged in. For example if the username is wrong it won't log in. So I then want to check if it is or isn't logged in. If it's not then it'll get tossed by the filter. – Mancini 10/11, 2015 at 17:46

Also note that your code could just be .peek(Account::login) and .filter(Account::loggedIn); there's no reason to write a Consumer and Predicate that just calls another method like that. – Compiler 10/11, 2015 at 21:32

Also note that the stream API explicitly discourages side-effects in behavioural parameters. – Grandson 10/11, 2015 at 22:27

@DidierL Okay so would the following be discourage? Consumer<Account> login = account -> getWebsite(account.getUrl()).login(account.getUsername(), account.getPassword()) – Mancini 10/11, 2015 at 23:41

Useful consumers always have side-effects, those are not discouraged of course. This is actually mentioned in the same section: “A small number of stream operations, such as forEach() and peek(), can operate only via side-effects; these should be used with care.”. My remark was more to remind that the peek operation (which is designed for debugging purposes) should not be replaced by doing the same thing inside another operation like map() or filter(). – Grandson 10/11, 2015 at 23:53

@DidierL Okay thanks. So I would call forEach(login) on the accounts. Then when I want to do something with the logged in accounts filter using a loggedIn predicate? Starting to make more sense now. Thanks – Mancini 10/11, 2015 at 23:55

Yep, exactly as in @Makoto's answer :-) – Grandson 10/11, 2015 at 23:59

A kitchen knife can kill a person, but you don't go to a war with that! :). Java 8 streams are cool. But, try not to achieve everything using streams when regular loops can do better. A forEach would be more intuitive here. We are so obsessed with java 8 streams and try to solve everything using them. I observe lot of people(including me) try to achieve something like the above using peek() thought the API clearly mentions not to use it. – Mayle 20/11, 2020 at 16:52

D

112

The key takeaway from this:

Don't use the API in an unintended way, even if it accomplishes your immediate goal. That approach may break in the future, and it is also unclear to future maintainers.

There is no harm in breaking this out to multiple operations, as they are distinct operations. There is harm in using the API in an unclear and unintended way, which may have ramifications if this particular behavior is modified in future versions of Java.

Using forEach on this operation would make it clear to the maintainer that there is an intended side effect on each element of accounts, and that you are performing some operation that can mutate it.

It's also more conventional in the sense that peek is an intermediate operation which doesn't operate on the entire collection until the terminal operation runs, but forEach is indeed a terminal operation. This way, you can make strong arguments around the behavior and the flow of your code as opposed to asking questions about if peek would behave the same as forEach does in this context.

accounts.forEach(a -> a.login());
List<Account> loggedInAccounts = accounts.stream()
                                         .filter(Account::loggedIn)
                                         .collect(Collectors.toList());

Discommend answered 10/11, 2015 at 17:55 Comment(16)

If you perform the login in a preprocessing step, you don’t need a stream at all. You can perform forEach right at the source collection: accounts.forEach(a -> a.login()); – Sempiternal 10/11, 2015 at 17:59

@Holger: Excellent point. I've incorporated that into the answer. – Discommend 10/11, 2015 at 18:0

Great answer! I was torn between this and what @Sempiternal said. Both convey the same thing, but this seems a little clearer and has a nice example too. I'll follow this approach, even though I doubt there will be another developer using this, I myself may forget the intent of the function. – Mancini 10/11, 2015 at 19:24

@Adam.J: Right, my answer focused more on the general question contained in your title, i.e. is this method really only for debugging, by explaining the aspects of that method. This answer is more fused on your actual use case and how to do it instead. So you could say, together they provide the full picture. First, the reason why this is not the intended use, second the conclusion, not to stick to an unintended use and what to do instead. The latter will have more practical use for you. – Sempiternal 10/11, 2015 at 19:43

Yeah I can see that. Just a shame I can't accept bother answers hah. One this is for sure, both answers have definitely helped me understand why not to do it and how to achieve the same thing, in a more practical manor. – Mancini 10/11, 2015 at 19:52

This works in OP's case, where the stream can be easily recreated. However, if the stream wasn't based on a collection, this approach wouldn't work so well. You could make an argument that if the stream is only available once, a semantically better way to write this would be .map(a -> { account.login(); return account; }), but that does seem a bit more verbose than a simple .peek(Account::login). – Compiler 10/11, 2015 at 21:36

@JoshuaTaylor: If it's not a collection or an array, you're doing something with I/O and you're attempting to act on it in a way that was probably not intended. The main takeaway from this (besides the bolded one) would be to perform the transformation operation before you filter. Needing to transform and filter in the same step may be seen as an edge case or a code smell, and I'd lean more towards the latter. – Discommend 10/11, 2015 at 22:13

@Discommend " If it's not a collection or an array, you're doing something with I/O and you're attempting to act on it in a way that was probably not intended. " That's not necessarily true at all. I can write a method that takes a Stream<...> and then call it with myCollection.stream() without doing anything that's IO based. If the streams API is supposed to be useful, there's no real reason why methods might not take streams as arguments. – Compiler 10/11, 2015 at 22:18

Of course, it was much easier if the login() method returned a boolean value indicating the success status… – Sempiternal 11/11, 2015 at 7:59

@Sempiternal I was wondering if this was the correct way to do it or not? I have a few void methods which I later need to check they're success. I wasn't sure if this should be two methods or one which returned a boolean. Thanks for this. – Mancini 11/11, 2015 at 9:25

@Sempiternal Also, could I make login a predicate? It would log in, then return loggedIn? – Mancini 11/11, 2015 at 10:38

That’s what I was aiming at. If login() returns a boolean, you can use it as a predicate which is the cleanest solution. It still has a side effect, but that’s ok as long as it is non-interfering, i.e. the login process` of one Account has no influence on the login` process` of another Account. – Sempiternal 11/11, 2015 at 11:13

@Sempiternal Thanks! That seems like the better solution! – Mancini 11/11, 2015 at 11:19

How is this the accepted answer? We are iterating over the collection twice while technically only one iteration is required. I am currently facing a similar issue and am considering usage of a classic forEach Loop. What advantage does the usage of two streams have here? – Telamon 28/3, 2017 at 7:57

@Lukas: One stream introduces side-effects, and the other doesn't. If you blend streams which have side-effects in with streams that shouldn't, it can be come trickier to debug and maintain going forward. This is for clarity's sake. At the same time, I'd doubt there's much in terms of performance win by iterating over it once; you're still doing the same operations on it, with the second operation simply filtering out the unsuccessfully authenticated accounts. – Discommend 28/3, 2017 at 17:22

Obviously this solution is not good as it is forcing a functional way of writing where a non functional code makes so much more sense. Either the login function should return a logged in Account object or at least a boolean or the classic for each loop is a cleaner way to write it. If you have to iterate twice over the same collection then your code is broken. Please don't use the stream API for the sake of using it. – Reconstructive 3/7, 2018 at 7:56

S

166

The important thing you have to understand is that streams are driven by the terminal operation. The terminal operation determines whether all elements have to be processed or any at all. So collect is an operation that processes each item, whereas findAny may stop processing items once it encountered a matching element.

And count() may not process any elements at all when it can determine the size of the stream without processing the items. Since this is an optimization not made in Java 8, but which will be in Java 9, there might be surprises when you switch to Java 9 and have code relying on count() processing all items. This is also connected to other implementation-dependent details, e.g. even in Java 9, the reference implementation will not be able to predict the size of an infinite stream source combined with limit while there is no fundamental limitation preventing such prediction.

Since peek allows “performing the provided action on each element as elements are consumed from the resulting stream”, it does not mandate processing of elements but will perform the action depending on what the terminal operation needs. This implies that you have to use it with great care if you need a particular processing, e.g. want to apply an action on all elements. It works if the terminal operation is guaranteed to process all items, but even then, you must be sure that not the next developer changes the terminal operation (or you forget that subtle aspect).

Further, while streams guarantee to maintain the encounter order for a certain combination of operations even for parallel streams, these guarantees do not apply to peek. When collecting into a list, the resulting list will have the right order for ordered parallel streams, but the peek action may get invoked in an arbitrary order and concurrently.

So the most useful thing you can do with peek is to find out whether a stream element has been processed which is exactly what the API documentation says:

This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline

Sempiternal answered 10/11, 2015 at 17:50 Comment(4)

will there be any problem, future or present, in OP's use case? Does his code always do what he wants? – Mantissa 10/11, 2015 at 18:6

@bayou.io: as far as I can see, there is no problem in this exact form. But as I tried to explain, using it this way implies you have to remember this aspect, even if you come back to the code one or two years later to incorporate «feature request 9876» into the code… – Sempiternal 10/11, 2015 at 18:9

"the peek action may get invoked in an arbitrary order and concurrently". Doesn't this statement go against their rule for how peek works, e.g. "as elements are consumed"? – Crepuscule 30/6, 2017 at 16:17

@Jose Martinez: It says “as elements are consumed from the resulting stream”, which isn’t the terminal action but the processing, though even the terminal action could consume elements out of order as long as the final result is consistent. But I also think, the phrase of the API note, “see the elements as they flow past a certain point in a pipeline” does a better job at describing it. – Sempiternal 3/7, 2017 at 10:42

D

112

The key takeaway from this:

Don't use the API in an unintended way, even if it accomplishes your immediate goal. That approach may break in the future, and it is also unclear to future maintainers.

There is no harm in breaking this out to multiple operations, as they are distinct operations. There is harm in using the API in an unclear and unintended way, which may have ramifications if this particular behavior is modified in future versions of Java.

Using forEach on this operation would make it clear to the maintainer that there is an intended side effect on each element of accounts, and that you are performing some operation that can mutate it.

It's also more conventional in the sense that peek is an intermediate operation which doesn't operate on the entire collection until the terminal operation runs, but forEach is indeed a terminal operation. This way, you can make strong arguments around the behavior and the flow of your code as opposed to asking questions about if peek would behave the same as forEach does in this context.

accounts.forEach(a -> a.login());
List<Account> loggedInAccounts = accounts.stream()
                                         .filter(Account::loggedIn)
                                         .collect(Collectors.toList());