urllib3 connectionpool - Connection pool is full, discarding connection
Asked Answered
M

2

72

Does seeing the

urllib3.connectionpool WARNING - Connection pool is full, discarding connection

mean that I am effectively loosing data (because of lost connection)
OR
Does it mean that connection is dropped (because pool is full); however, the same connection will be re-tried later on when connection pool becomes available?

Moll answered 13/12, 2018 at 15:41 Comment(1)
#18466579Cauldron
C
73

No data is being lost!

The connection is being discarded after the request is completed (because the pool is full, as mentioned). This means that this particular connection is not going to be re-used in the future.

Because a urllib3 PoolManager reuses connections, it will limit how many connections are retained per host to avoid accumulating too many unused sockets. The PoolManager can be configured to avoid creating excess sockets when the pool doesn't have any idle sockets available with PoolManager(..., block=True).

If you're relying on concurrency, it could be a good idea to increase the size of the pool (maxsize) to be at least as large as the number of threads you're using, so that each thread effectively gets its own connection.

More details here: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#customizing-pool-behavior

Cyrstalcyrus answered 13/12, 2018 at 15:41 Comment(10)
That's a very wrong answer and interpretation, judging by the very documentation you mentioned. There's no "retrying later", all connections are opened immediately regardless of pool size. Also, increasing the number of threads without changing maxsize (or pool_size if different hosts) will not make the warnings go away, it will increase them!Hobbism
@Hobbism Rereading it now, I think you're right. My answer was very confusing. I meant that the first part was the correct interpretation ("connection is being dropped"), but the second part that it's being reused was indeed incorrect. I also meant that they should increase the pool size, not the number of threads. I clarified the answer, thanks for pointing it out.Cyrstalcyrus
As @Hobbism mentions, increasing the maxsize made the warnings appear more frequently than before. Then I set maxsize=1 and they went away... Although the concurrent requests speed slowed down overall. Not sure how to find the right balance between no warnings and fast requests lol.Wintertime
@dvdblk: There's no "balance" between warnings and performance: to get no warnings just make your maxsize equal to the number of worker threads you're using. That way all connections will be kept in the pool for reuse, hence no warnings. And to improve performance, just increase your worker threads. I've read that around 4-5 threads per CPU core is optimal for internet (i.e. slow) I/O.Hobbism
This is assuming all your connections are to a single host (so you're using a single ConnectionPool)Hobbism
@shazow: the update was a great improvement! But a statement like "it will limit how many connections are allowed per host" is still inaccurate: urllib3 will always open as many connections as you request, even if it discards some after usage.Hobbism
I'm using aiohttp and creating new asyncio tasks dynamically that use this session. So I'm not sure if I can set a precise number for it.Wintertime
@dvdblk: Setting worker threads in aiohttp is completely out of the scope for this question, but you surely can do it, just check its documentation on connectorsHobbism
@Hobbism Another good point. I clarified that sentence and added a note about block=True, also made the answer into a community wiki so you're welcome to edit it further. :)Cyrstalcyrus
I had a nested ThreadPool situation and started seeing the warnings. Like the answer says, nothing was lost, but it was a good indication that I was hammering the other end with requests in a way that was likely to cause throttling that would lead to lost data :)Overscore
H
25

According to the documentation on Customizing Pool Behavior, neither of your interpretations are correct:

By default, if a new request is made and there is no free connection in the pool then a new connection will be created. However, this connection will not be saved if more than maxsize connections exist. This means that maxsize does not determine the maximum number of connections that can be open to a particular host, just the maximum number of connections to keep in the pool.

(my emphasis)

So connections were not aborted to be retried later. They were made immediately, as requested, and results returned. Then, after they have completed, those "extra" connections were discarded, i.e., they were not kept in the pool for later reuse.

For example, if your maxsize is 10 (the default when using urllib3 via requests), and you launch 50 requests in parallel, those 50 connections will be performed at once, and after completion only 10 will remain in the pool while 40 will be discarded (and issue that warning).

Hobbism answered 17/3, 2021 at 10:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.