In Java streams is peek really only for debugging?
Asked Answered
M

10

220

I'm reading up about Java streams and discovering new things as I go along. One of the new things I found was the peek() function. Almost everything I've read on peek says it should be used to debug your Streams.

What if I had a Stream where each Account has a username, password field and a login() and loggedIn() method.

I also have

Consumer<Account> login = account -> account.login();

and

Predicate<Account> loggedIn = account -> account.loggedIn();

Why would this be so bad?

List<Account> accounts; //assume it's been setup
List<Account> loggedInAccount = 
accounts.stream()
    .peek(login)
    .filter(loggedIn)
    .collect(Collectors.toList());

Now as far as I can tell this does exactly what it's intended to do. It;

  • Takes a list of accounts
  • Tries to log in to each account
  • Filters out any account which aren't logged in
  • Collects the logged in accounts into a new list

What is the downside of doing something like this? Any reason I shouldn't proceed? Lastly, if not this solution then what?

The original version of this used the .filter() method as follows;

.filter(account -> {
        account.login();
        return account.loggedIn();
    })
Mancini answered 10/11, 2015 at 17:12 Comment(11)
Any time I find myself needing a multi-line lambda, I move the lines to a private method and pass the method reference instead of the lambda.Tephra
Yeah I understand this. I was just trying to more clearly demonstrate what I'm trying to achieve. Thanks though :)Mancini
What's the intent - are you trying to log all accounts in and filter them based on if they're logged in (which may be trivially true)? Or, do you want to log them in, then filter them based on whether or not they've logged in? I'm asking this in this order because forEach may be the operation you want as opposed to peek. Just because it's in the API doesn't mean it's not open for abuse (like Optional.of).Discommend
Filter based in if they have actually logged in. For example if the username is wrong it won't log in. So I then want to check if it is or isn't logged in. If it's not then it'll get tossed by the filter.Mancini
Also note that your code could just be .peek(Account::login) and .filter(Account::loggedIn); there's no reason to write a Consumer and Predicate that just calls another method like that.Compiler
Also note that the stream API explicitly discourages side-effects in behavioural parameters.Grandson
@DidierL Okay so would the following be discourage? Consumer<Account> login = account -> getWebsite(account.getUrl()).login(account.getUsername(), account.getPassword())Mancini
Useful consumers always have side-effects, those are not discouraged of course. This is actually mentioned in the same section: “A small number of stream operations, such as forEach() and peek(), can operate only via side-effects; these should be used with care.”. My remark was more to remind that the peek operation (which is designed for debugging purposes) should not be replaced by doing the same thing inside another operation like map() or filter().Grandson
@DidierL Okay thanks. So I would call forEach(login) on the accounts. Then when I want to do something with the logged in accounts filter using a loggedIn predicate? Starting to make more sense now. ThanksMancini
Yep, exactly as in @Makoto's answer :-)Grandson
A kitchen knife can kill a person, but you don't go to a war with that! :). Java 8 streams are cool. But, try not to achieve everything using streams when regular loops can do better. A forEach would be more intuitive here. We are so obsessed with java 8 streams and try to solve everything using them. I observe lot of people(including me) try to achieve something like the above using peek() thought the API clearly mentions not to use it.Mayle
D
112

The key takeaway from this:

Don't use the API in an unintended way, even if it accomplishes your immediate goal. That approach may break in the future, and it is also unclear to future maintainers.


There is no harm in breaking this out to multiple operations, as they are distinct operations. There is harm in using the API in an unclear and unintended way, which may have ramifications if this particular behavior is modified in future versions of Java.

Using forEach on this operation would make it clear to the maintainer that there is an intended side effect on each element of accounts, and that you are performing some operation that can mutate it.

It's also more conventional in the sense that peek is an intermediate operation which doesn't operate on the entire collection until the terminal operation runs, but forEach is indeed a terminal operation. This way, you can make strong arguments around the behavior and the flow of your code as opposed to asking questions about if peek would behave the same as forEach does in this context.

accounts.forEach(a -> a.login());
List<Account> loggedInAccounts = accounts.stream()
                                         .filter(Account::loggedIn)
                                         .collect(Collectors.toList());
Discommend answered 10/11, 2015 at 17:55 Comment(16)
If you perform the login in a preprocessing step, you don’t need a stream at all. You can perform forEach right at the source collection: accounts.forEach(a -> a.login());Sempiternal
@Holger: Excellent point. I've incorporated that into the answer.Discommend
Great answer! I was torn between this and what @Sempiternal said. Both convey the same thing, but this seems a little clearer and has a nice example too. I'll follow this approach, even though I doubt there will be another developer using this, I myself may forget the intent of the function.Mancini
@Adam.J: Right, my answer focused more on the general question contained in your title, i.e. is this method really only for debugging, by explaining the aspects of that method. This answer is more fused on your actual use case and how to do it instead. So you could say, together they provide the full picture. First, the reason why this is not the intended use, second the conclusion, not to stick to an unintended use and what to do instead. The latter will have more practical use for you.Sempiternal
Yeah I can see that. Just a shame I can't accept bother answers hah. One this is for sure, both answers have definitely helped me understand why not to do it and how to achieve the same thing, in a more practical manor.Mancini
This works in OP's case, where the stream can be easily recreated. However, if the stream wasn't based on a collection, this approach wouldn't work so well. You could make an argument that if the stream is only available once, a semantically better way to write this would be .map(a -> { account.login(); return account; }), but that does seem a bit more verbose than a simple .peek(Account::login).Compiler
@JoshuaTaylor: If it's not a collection or an array, you're doing something with I/O and you're attempting to act on it in a way that was probably not intended. The main takeaway from this (besides the bolded one) would be to perform the transformation operation before you filter. Needing to transform and filter in the same step may be seen as an edge case or a code smell, and I'd lean more towards the latter.Discommend
@Discommend " If it's not a collection or an array, you're doing something with I/O and you're attempting to act on it in a way that was probably not intended. " That's not necessarily true at all. I can write a method that takes a Stream<...> and then call it with myCollection.stream() without doing anything that's IO based. If the streams API is supposed to be useful, there's no real reason why methods might not take streams as arguments.Compiler
Of course, it was much easier if the login() method returned a boolean value indicating the success status…Sempiternal
@Sempiternal I was wondering if this was the correct way to do it or not? I have a few void methods which I later need to check they're success. I wasn't sure if this should be two methods or one which returned a boolean. Thanks for this.Mancini
@Sempiternal Also, could I make login a predicate? It would log in, then return loggedIn?Mancini
That’s what I was aiming at. If login() returns a boolean, you can use it as a predicate which is the cleanest solution. It still has a side effect, but that’s ok as long as it is non-interfering, i.e. the login process` of one Account has no influence on the login` process` of another Account.Sempiternal
@Sempiternal Thanks! That seems like the better solution!Mancini
How is this the accepted answer? We are iterating over the collection twice while technically only one iteration is required. I am currently facing a similar issue and am considering usage of a classic forEach Loop. What advantage does the usage of two streams have here?Telamon
@Lukas: One stream introduces side-effects, and the other doesn't. If you blend streams which have side-effects in with streams that shouldn't, it can be come trickier to debug and maintain going forward. This is for clarity's sake. At the same time, I'd doubt there's much in terms of performance win by iterating over it once; you're still doing the same operations on it, with the second operation simply filtering out the unsuccessfully authenticated accounts.Discommend
Obviously this solution is not good as it is forcing a functional way of writing where a non functional code makes so much more sense. Either the login function should return a logged in Account object or at least a boolean or the classic for each loop is a cleaner way to write it. If you have to iterate twice over the same collection then your code is broken. Please don't use the stream API for the sake of using it.Reconstructive
S
166

The important thing you have to understand is that streams are driven by the terminal operation. The terminal operation determines whether all elements have to be processed or any at all. So collect is an operation that processes each item, whereas findAny may stop processing items once it encountered a matching element.

And count() may not process any elements at all when it can determine the size of the stream without processing the items. Since this is an optimization not made in Java 8, but which will be in Java 9, there might be surprises when you switch to Java 9 and have code relying on count() processing all items. This is also connected to other implementation-dependent details, e.g. even in Java 9, the reference implementation will not be able to predict the size of an infinite stream source combined with limit while there is no fundamental limitation preventing such prediction.

Since peek allows “performing the provided action on each element as elements are consumed from the resulting stream”, it does not mandate processing of elements but will perform the action depending on what the terminal operation needs. This implies that you have to use it with great care if you need a particular processing, e.g. want to apply an action on all elements. It works if the terminal operation is guaranteed to process all items, but even then, you must be sure that not the next developer changes the terminal operation (or you forget that subtle aspect).

Further, while streams guarantee to maintain the encounter order for a certain combination of operations even for parallel streams, these guarantees do not apply to peek. When collecting into a list, the resulting list will have the right order for ordered parallel streams, but the peek action may get invoked in an arbitrary order and concurrently.

So the most useful thing you can do with peek is to find out whether a stream element has been processed which is exactly what the API documentation says:

This method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline

Sempiternal answered 10/11, 2015 at 17:50 Comment(4)
will there be any problem, future or present, in OP's use case? Does his code always do what he wants?Mantissa
@bayou.io: as far as I can see, there is no problem in this exact form. But as I tried to explain, using it this way implies you have to remember this aspect, even if you come back to the code one or two years later to incorporate «feature request 9876» into the code…Sempiternal
"the peek action may get invoked in an arbitrary order and concurrently". Doesn't this statement go against their rule for how peek works, e.g. "as elements are consumed"?Crepuscule
@Jose Martinez: It says “as elements are consumed from the resulting stream”, which isn’t the terminal action but the processing, though even the terminal action could consume elements out of order as long as the final result is consistent. But I also think, the phrase of the API note, “see the elements as they flow past a certain point in a pipeline” does a better job at describing it.Sempiternal
D
112

The key takeaway from this:

Don't use the API in an unintended way, even if it accomplishes your immediate goal. That approach may break in the future, and it is also unclear to future maintainers.


There is no harm in breaking this out to multiple operations, as they are distinct operations. There is harm in using the API in an unclear and unintended way, which may have ramifications if this particular behavior is modified in future versions of Java.

Using forEach on this operation would make it clear to the maintainer that there is an intended side effect on each element of accounts, and that you are performing some operation that can mutate it.

It's also more conventional in the sense that peek is an intermediate operation which doesn't operate on the entire collection until the terminal operation runs, but forEach is indeed a terminal operation. This way, you can make strong arguments around the behavior and the flow of your code as opposed to asking questions about if peek would behave the same as forEach does in this context.

accounts.forEach(a -> a.login());
List<Account> loggedInAccounts = accounts.stream()
                                         .filter(Account::loggedIn)
                                         .collect(Collectors.toList());
Discommend answered 10/11, 2015 at 17:55 Comment(16)
If you perform the login in a preprocessing step, you don’t need a stream at all. You can perform forEach right at the source collection: accounts.forEach(a -> a.login());Sempiternal
@Holger: Excellent point. I've incorporated that into the answer.Discommend
Great answer! I was torn between this and what @Sempiternal said. Both convey the same thing, but this seems a little clearer and has a nice example too. I'll follow this approach, even though I doubt there will be another developer using this, I myself may forget the intent of the function.Mancini
@Adam.J: Right, my answer focused more on the general question contained in your title, i.e. is this method really only for debugging, by explaining the aspects of that method. This answer is more fused on your actual use case and how to do it instead. So you could say, together they provide the full picture. First, the reason why this is not the intended use, second the conclusion, not to stick to an unintended use and what to do instead. The latter will have more practical use for you.Sempiternal
Yeah I can see that. Just a shame I can't accept bother answers hah. One this is for sure, both answers have definitely helped me understand why not to do it and how to achieve the same thing, in a more practical manor.Mancini
This works in OP's case, where the stream can be easily recreated. However, if the stream wasn't based on a collection, this approach wouldn't work so well. You could make an argument that if the stream is only available once, a semantically better way to write this would be .map(a -> { account.login(); return account; }), but that does seem a bit more verbose than a simple .peek(Account::login).Compiler
@JoshuaTaylor: If it's not a collection or an array, you're doing something with I/O and you're attempting to act on it in a way that was probably not intended. The main takeaway from this (besides the bolded one) would be to perform the transformation operation before you filter. Needing to transform and filter in the same step may be seen as an edge case or a code smell, and I'd lean more towards the latter.Discommend
@Discommend " If it's not a collection or an array, you're doing something with I/O and you're attempting to act on it in a way that was probably not intended. " That's not necessarily true at all. I can write a method that takes a Stream<...> and then call it with myCollection.stream() without doing anything that's IO based. If the streams API is supposed to be useful, there's no real reason why methods might not take streams as arguments.Compiler
Of course, it was much easier if the login() method returned a boolean value indicating the success status…Sempiternal
@Sempiternal I was wondering if this was the correct way to do it or not? I have a few void methods which I later need to check they're success. I wasn't sure if this should be two methods or one which returned a boolean. Thanks for this.Mancini
@Sempiternal Also, could I make login a predicate? It would log in, then return loggedIn?Mancini
That’s what I was aiming at. If login() returns a boolean, you can use it as a predicate which is the cleanest solution. It still has a side effect, but that’s ok as long as it is non-interfering, i.e. the login process` of one Account has no influence on the login` process` of another Account.Sempiternal
@Sempiternal Thanks! That seems like the better solution!Mancini
How is this the accepted answer? We are iterating over the collection twice while technically only one iteration is required. I am currently facing a similar issue and am considering usage of a classic forEach Loop. What advantage does the usage of two streams have here?Telamon
@Lukas: One stream introduces side-effects, and the other doesn't. If you blend streams which have side-effects in with streams that shouldn't, it can be come trickier to debug and maintain going forward. This is for clarity's sake. At the same time, I'd doubt there's much in terms of performance win by iterating over it once; you're still doing the same operations on it, with the second operation simply filtering out the unsuccessfully authenticated accounts.Discommend
Obviously this solution is not good as it is forcing a functional way of writing where a non functional code makes so much more sense. Either the login function should return a logged in Account object or at least a boolean or the classic for each loop is a cleaner way to write it. If you have to iterate twice over the same collection then your code is broken. Please don't use the stream API for the sake of using it.Reconstructive
E
33

Perhaps a rule of thumb should be that if you do use peek outside the "debug" scenario, you should only do so if you're sure of what the terminating and intermediate filtering conditions are. For example:

return list.stream().map(foo->foo.getBar())
                    .peek(bar->bar.publish("HELLO"))
                    .collect(Collectors.toList());

seems to be a valid case where you want, in one operation to transform all Foos to Bars and tell them all hello.

Seems more efficient and elegant than something like:

List<Bar> bars = list.stream().map(foo->foo.getBar()).collect(Collectors.toList());
bars.forEach(bar->bar.publish("HELLO"));
return bars;

and you don't end up iterating a collection twice.

Encarnalize answered 11/11, 2016 at 12:52 Comment(2)
Iterating twice is O(2n) =~ O(n). The chances of you having a performance problem because of this are minimal. You do however score on clarity if you don't use peek.Newcomb
In fact, it’s possible that the two iterations perform better than the single stream operation with two different purposes. Predicting performance in an environment with a runtime optimizer is very hard.Sempiternal
C
11

A lot of answers made good points, and especially the (accepted) answer by Makoto describes the possible problems in quite some detail. But no one actually showed how it can go wrong:

[1]-> IntStream.range(1, 10).peek(System.out::println).count();
|  $6 ==> 9

No output.

[2]-> IntStream.range(1, 10).filter(i -> i%2==0).peek(System.out::println).count();
|  $9 ==> 4

Outputs numbers 2, 4, 6, 8.

[3]-> IntStream.range(1, 10).filter(i -> i > 0).peek(System.out::println).count();
|  $12 ==> 9

Outputs numbers 1 to 9.

[4]-> IntStream.range(1, 10).map(i -> i * 2).peek(System.out::println).count();
|  $16 ==> 9

No output.

[5]-> Stream.of(1, 2, 3, 4, 5, 6, 7, 8, 9).peek(System.out::println).count();
|  $23 ==> 9

No output.

[6]-> Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9).stream().peek(System.out::println).count();
|  $25 ==> 9

No output.

[7]-> IntStream.range(1, 10).filter(i -> true).peek(System.out::println).count();
|  $30 ==> 9

Outputs numbers 1 to 9.

[1]-> List<Integer> list = new ArrayList<>();
|  list ==> []
[2]-> Stream.of(1, 5, 2, 7, 3, 9, 8, 4, 6).sorted().peek(list::add).count();
|  $7 ==> 9
[3]-> list
|  list ==> []

(You get the idea.)

The examples were run in jshell (Java 15.0.2) and mimic the use case of converting data (replace System.out::println by list::add for example as also done in some answers) and returning how much data was added. The current observation is that any operation that could filter elements (such as filter or skip) seems to force handling of all remaining elements, but it need not stay that way.

Carine answered 4/2, 2021 at 14:48 Comment(5)
I'm not sure your results are reliable. As the .count terminal operation also produces output, JShell might be overwriting the output of the .peek operation with it. If you replace .count with another terminal operation that doesn't produce output, it works nicely, e.g. jshell> IntStream.range(1,10).peek(System.out::println).forEach(i->{}).Dextroglucose
Having count as the terminal operation here is exactly the problem I want to show. count is not interested in your actual elements which is why they are sometimes not processed and the count just calculated.Adrea
Ah, ok, now I see.Dextroglucose
Just for anyone wondering how does the count() method works without actually calculating number of elements in the stream, I strongly believe it is because of flag StreamOpFlag.SIZED being set by IntStream and Stream.of. What is worse, Stream.of behaviour differs between JVM versions: in 1.8 it used to be a normal stream, but in some latter version it became SIZED, iirc.Procurable
@Procurable as said in other (much older) answers, count() was changed in Java 9, not Stream.of()’s behaviorSempiternal
S
8

I would say that peek provides the ability to decentralize code that can mutate stream objects, or modify global state (based on them), instead of stuffing everything into a simple or composed function passed to a terminal method.

Now the question might be: should we mutate stream objects or change global state from within functions in functional style java programming?

If the answer to any of the the above 2 questions is yes (or: in some cases yes) then peek() is definitely not only for debugging purposes, for the same reason that forEach() isn't only for debugging purposes.

For me when choosing between forEach() and peek(), is choosing the following: Do I want pieces of code that mutate stream objects (or change global state) to be attached to a composable, or do I want them to attach directly to stream?

I think peek() will better pair with java9 methods. e.g. takeWhile() may need to decide when to stop iteration based on an already mutated object, so paring it with forEach() would not have the same effect.

P.S. I have not mentioned map() anywhere because in case we want to mutate objects (or global state), rather than generating new objects, it works exactly like peek().

Swordcraft answered 21/6, 2018 at 14:14 Comment(1)
Not true at all. According to its JavaDocs, the intermediate Stream operation java.util.Stream.peek() “exists mainly to support debugging” purposes.Autograph
D
8

Despite the documentation note for .peek saying the "method exists mainly to support debugging" I think it has general relevance. For one thing the documentation says "mainly", so leaves room for other use cases. It is not deprecated since years, and speculations about its removal are futile IMO.

I would say in a world where we still have to handle side-effectful methods it has a valid place and utility. There are many valid operations in streams that use side-effects. Many have been mentioned in other answers, I'll just add here to set a flag on a collection of objects, or register them with a registry, on objects which are then further processed in the stream. Not to mention creating log messages during stream processing.

I support the idea to have separate actions in separate stream operations, so I avoid pushing everything into a final .forEach. I favor .peek over an equivalent .map with a lambda who's only purpose, besides calling the side-effect method, is to return the passed in argument. .peek tells me that what goes in also goes out as soon as I encounter this operation, and I don't need to read a lambda to find out. In that sense it is succinct, expressive and improves readability of the code.

Having said that I agree with all the considerations when using .peek, e.g. being aware of the effect of the terminal operation of the stream it is used in.

Dextroglucose answered 23/4, 2021 at 6:10 Comment(1)
This is an excellent answer! Tysvm for detailing your thoughts.Purpose
T
7

Although I agree with most answers above, I have one case in which using peek actually seems like the cleanest way to go.

Similar to your use case, suppose you want to filter only on active accounts and then perform a login on these accounts.

accounts.stream()
    .filter(Account::isActive)
    .peek(login)
    .collect(Collectors.toList());

Peek is helpful to avoid the redundant call while not having to iterate the collection twice:

accounts.stream()
    .filter(Account::isActive)
    .map(account -> {
        account.login();
        return account;
    })
    .collect(Collectors.toList());
Tameratamerlane answered 26/10, 2017 at 14:45 Comment(3)
All you have to do is to get that login method right. I really don't see how the peek is the cleanest way to go. How should someone who is reading your code know that you actually misusing the API. Good and clean code doesn't force a reader to make assumptions about the code.Reconstructive
I think you need to qualify the method reference in the .peek operation, e.g. as Account::login, for it to work.Dextroglucose
I agree that using .peek in favor of the .map alternative is more succinct, expressive and understandable. The lambda in the .map is only necessary to return the passed in object. .peek does this on its own. I know that as soon as I read the operation name, and don't have to inspect the lambda to find it out.Dextroglucose
P
3

The functional solution is to make account object immutable. So account.login() must return a new account object. This will mean that the map operation can be used for login instead of peek.

Poverty answered 28/2, 2019 at 20:11 Comment(0)
S
1

To get rid of warnings, I use functor tee, named after Unix' tee:

public static <T> Function<T,T> tee(Consumer<T> after) {
    return arg -> {
        f.accept(arg);
        return arg;
    };
}

You can replace:

  .peek(f)

with

  .map(tee(f))
Selfpronouncing answered 23/10, 2022 at 14:16 Comment(0)
B
0

It seems like a helper class is needed:

public static class OneBranchOnly<T> {
    public Function<T, T> apply(Predicate<? super T> test,
                                Consumer<? super T> t) {
        return o -> {
            if (test.test(o)) t.accept(o);
            return o;
        };
    }
}

then switch peek with map:

.map(new OneBranchOnly< Account >().apply(                   
                    account -> account.isTestAccount(),
                    account -> account.setName("Test Account"))
)

results: Collections of accounts that only test accounts got renamed (no reference gets maintained)

Billowy answered 14/8, 2022 at 18:24 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.