After reading through the change and trying to understand your questions I'm going to give a go at answering this. I'm not in any way sure this is the correct answer and I'm going to make some logical assumptions based on what I know about reactor, webflux and webclient.
Ever since WebClient was released the main workhorse was supposed to be retrieve()
to be able to provide a simple but stable API against a fully asynchronous webclient.
The problem was that most people were used to work with the ResponseEntities
returned by the old deprecated RestTemplate
so ppl instead turned to using the exchange()
function instead.
But it's here the problem lies. When you gain access to the Response
you also have a responsibility attached to it. You are obligated to consume the response
so that the server can close the TCP connection. This usually means that you need to read the header and the body and then we can close the connection.
If you don't consume the response you will have an open connection, with a resulting memory leak.
Spring solved this by providing functions like response#bodyToMono
and response#bodyToFlux
which consume the body and then after closes the response (which in turn closes the connection, thus consuming the response).
But it turns out it was quite easy (since developers are crafty bastards) for people to write code that didn't consume the response hence giving dangling TCP connections.
webclient.url( ... )
.exchange(response -> {
// This is just an example but, but here i just re-return the response
// which means that the server will keep the connection open, until i
// actually consume the body. I could send this all over my application
// but never consume it and the server will keep the connection open as
// long as i do, could be a potential memory leak.
return Mono.just(response)
}
The new exchangeToMono
implementation basically forces you to consume the body in favour of avoiding memory leaks. If you want to work on the raw response, you will be forced to consume the body.
So lats talk about your example and your needs.
You just want to basically proxy the request from one server to another. You do actually consume the body you just don't do it in the flatMap
in close proximity to the WebClient.
.exchange()
.flatMap { response ->
ServerResponse.
.status(response.statusCode())
.headers { it.addAll(response.headers().asHttpHeaders()) }
.body(response.bodyToMono<ByteArray>())
// Here you are declaring you want to consume but it isn't consumed right here, its not consumed until much later.
}
Here in your code, you are returning a ServerResponse
but you have to always think about. Nothing happens until you subscribe. You are basically passing a long a ServerResponse
but you haven't consumed the body yet. You have only declared that when the server needs the body, it will then have to consume the body of the last response the get the new body.
Think of it this way, you are returning a ServerResponse
that only contains declarations about what we want in it, not what is actually in it.
As this gets returned from the flatMap
it will travel all the way out of the application until we write it as a response to our open TCP connection we have against the client.
Only there and then will the response be built and that's when your first response from the WebClient will be consumed and closed.
So your original code, does work, because you do consume the WebClient response, you are just not doing it until you write a response to the calling client.
What you are doing was not inherently wrong, it was just that having the WebClient API this way enhances the risk of ppl using it wrong, and memory leaks could happen.
I hope this at least answers some of the questions you have i was mostly writing down my interpretation of the change.