Why is the block size for Python httplib's reads hard coded as 8192 bytes
Asked Answered
C

2

14

I'm looking to make a fast streaming download -> upload to move large files via HTTP from one server to another.

During this, I've noticed that httplib, that is used by urllib3 and therefore also requests, seems to hard code how much it fetches from a stream at a time to 8192 bytes

https://github.com/python/cpython/blob/28453feaa8d88bbcbf6d834b1d5ca396d17265f2/Lib/http/client.py#L970

Why is this? What is the benefit of 8192 over other sizes?

Connote answered 10/2, 2018 at 10:46 Comment(6)
Originally commited in github.com/python/cpython/commit/… . Found via git blame.Mycosis
@AshishNitinPatil Ah thanks. Now I suspect the 8192 is from Patch #1065257: bugs.python.org/issue1065257 and bugs.python.org/file6362/httplib2.patch . However, I don't think the comments address the 8192?Connote
8k is a common block size for block devices, so it can be more efficient to read chunks of data in this block size or multiples of it. That's also a common http header size limit (e.g. in apache) so you can transmit a header in a single block. Do you have some reason that it shouldn't be 8k?Maulstick
@Maulstick I made a test using 64k, using an iterator as the source stream, and a transfer of 5gb was faster in that case.Connote
it seems that will be configurable in 3.7, for reasons similar to your use case. see merged PR4279 from 3 months agoMaulstick
Apache's maximum buffer size is 8K. Why 8K? Performance (flow-control) and also I believe there must be some security reasons to prevent DoS type attacks.Jardiniere
F
4

Nginx webserver

This is from nginx

Syntax: client_body_buffer_size size;

Default:    client_body_buffer_size 8k|16k;

Sets buffer size for reading client request body. In case the request body is larger than the buffer, the whole body or only its part is written to a temporary file. By default, buffer size is equal to two memory pages. This is 8K on x86, other 32-bit platforms, and x86-64. It is usually 16K on other 64-bit platforms

Apache WebServer

ProxyIOBufferSize Directive
Description:    Determine size of internal data throughput buffer
Syntax: ProxyIOBufferSize bytes
Default:    ProxyIOBufferSize 8192
Context:    server config, virtual host
Status: Extension
Module: mod_proxy

So Apache also uses 8192 by default as the proxy buffer size.

Apache Client

The apache Java client documentation indicates

https://hc.apache.org/httpcomponents-client-4.2.x/tutorial/html/connmgmt.html

  • CoreConnectionPNames.SOCKET_BUFFER_SIZE='http.socket.buffer-size': determines the size of the internal socket buffer used to buffer data while receiving / transmitting HTTP messages. This parameter expects a value of type java.lang.Integer. If this parameter is not set, HttpClient will allocate 8192 byte socket buffers.

Ruby Client

In ruby the value is set by default 16K

https://github.com/ruby/ruby/blob/814daf855e0aa2c3a1164dc765378d3a092a1825/lib/net/protocol.rb#L172

Then there are below thread

What is a good buffer size for socket programming?

What is the best memory buffer size to allocate to download a file from Internet?

Optimum file buffer read size?

If you look at many of this the consensus lies on 8K/16K as the buffer size. And it is not that it should be fixed to that but configurable and 8k/16K should be good enough for most situations. So I don't see a problem with Python also using that 8K by default. But yes it should have been configurable

Python 3.7 will have it configurable as such but then that may not help your cause if you can't upgrade to the same.

Fleer answered 20/2, 2018 at 3:2 Comment(0)
K
13

From what I found, the block size should be resources's page size but since pagesize is only available on UNIX, this was hardcoded to 8192 so all other systems specially Windows do not get blocked on this. Otherwise there is no other reason to hardcode it.

Source: https://bugs.python.org/issue21790

Kiruna answered 13/2, 2018 at 6:10 Comment(0)
F
4

Nginx webserver

This is from nginx

Syntax: client_body_buffer_size size;

Default:    client_body_buffer_size 8k|16k;

Sets buffer size for reading client request body. In case the request body is larger than the buffer, the whole body or only its part is written to a temporary file. By default, buffer size is equal to two memory pages. This is 8K on x86, other 32-bit platforms, and x86-64. It is usually 16K on other 64-bit platforms

Apache WebServer

ProxyIOBufferSize Directive
Description:    Determine size of internal data throughput buffer
Syntax: ProxyIOBufferSize bytes
Default:    ProxyIOBufferSize 8192
Context:    server config, virtual host
Status: Extension
Module: mod_proxy

So Apache also uses 8192 by default as the proxy buffer size.

Apache Client

The apache Java client documentation indicates

https://hc.apache.org/httpcomponents-client-4.2.x/tutorial/html/connmgmt.html

  • CoreConnectionPNames.SOCKET_BUFFER_SIZE='http.socket.buffer-size': determines the size of the internal socket buffer used to buffer data while receiving / transmitting HTTP messages. This parameter expects a value of type java.lang.Integer. If this parameter is not set, HttpClient will allocate 8192 byte socket buffers.

Ruby Client

In ruby the value is set by default 16K

https://github.com/ruby/ruby/blob/814daf855e0aa2c3a1164dc765378d3a092a1825/lib/net/protocol.rb#L172

Then there are below thread

What is a good buffer size for socket programming?

What is the best memory buffer size to allocate to download a file from Internet?

Optimum file buffer read size?

If you look at many of this the consensus lies on 8K/16K as the buffer size. And it is not that it should be fixed to that but configurable and 8k/16K should be good enough for most situations. So I don't see a problem with Python also using that 8K by default. But yes it should have been configurable

Python 3.7 will have it configurable as such but then that may not help your cause if you can't upgrade to the same.

Fleer answered 20/2, 2018 at 3:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.