How does a HTTP Proxy utilize the HTTP protocol? a Proxy RFC?
Asked Answered
B

3

53

How does one go about implementing a HTTP proxy compared to implementing a HTTP webserver, what are the differences? Is there a definitive guide or RFC or a helpful book on this subject?

Burgenland answered 28/9, 2011 at 3:16 Comment(0)
T
27

The requirements on HTTP Proxy servers are specified within

Toniatonic answered 28/9, 2011 at 4:1 Comment(2)
this rfc does not describe anything about https proxyHasseman
@nikoss: The original RFC this answer referenced (RFC2616) was split out into multiple RFCs. See section 4.3.6 of RFC7231 for the CONNECT method that is used to establish a proxied tunnel over which a TLS session for a HTTPS request can be established.Toniatonic
J
58

The header sent to a proxy is different.

For example, here is what is sent by Google Chrome to www.baidu.com via a proxy server:

GET http://www.baidu.com/ HTTP/1.1
Host: www.baidu.com
Proxy-Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
DNT: 1
Accept-Encoding: gzip, deflate, sdch
Accept-Language: zh-CN,zh;q=0.8

We can see it is

GET http://www.baidu.com/ HTTP/1.1

instead of

GET / HTTP/1.1

and here is

Proxy-Connection: keep-alive

also

Host: www.baidu.com

Host field is required for http proxy.

For HTTPS tunnel proxy:

CONNECT comet.zhihu.com:443 HTTP/1.1
Host: comet.zhihu.com:443
Proxy-Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36

We can see

CONNECT comet.zhihu.com:443 HTTP/1.1

domain:443 instead of https://domain.

CONNECT field turn the proxy server to something like a TCP tunnel, then the protocol HTTPS is replaced by the port :443

For socks5 proxy, things become easy, because socks5 care nothing about higher protocol, you just tell it host and port.

Juanjuana answered 8/7, 2016 at 4:50 Comment(1)
This answer seems to imply that the Host: header is only required for proxy servers. That's not true. From RFC 2616. "All Internet-based HTTP/1.1 servers MUST respond with a 400 (Bad Request) status code to any HTTP/1.1 request message which lacks a Host header field.". IE this is strictly required for all of HTTP 1.1 not just proxy servers.Spry
T
27

The requirements on HTTP Proxy servers are specified within

Toniatonic answered 28/9, 2011 at 4:1 Comment(2)
this rfc does not describe anything about https proxyHasseman
@nikoss: The original RFC this answer referenced (RFC2616) was split out into multiple RFCs. See section 4.3.6 of RFC7231 for the CONNECT method that is used to establish a proxied tunnel over which a TLS session for a HTTPS request can be established.Toniatonic
L
8

A proxy is very similar to a server; the only difference is that, after parsing the request, it merely forwards it and returns the result*, rather than processing the request, itself. Because the proxy does not have to do the same amount of processing as a normal server, it can often get away with a far more minimal parsing of the requests than a full-fleded server, but otherwise it is the same idea.

*Some proxies implement additional caching. Some also futz with the response/request, but that is the evil kind of proxy, which hopefully you do not have in mind.

Leisured answered 28/9, 2011 at 3:28 Comment(2)
Filter proxies are often used to maintain ones privacy. They are also helpful to get rid of unwanted content, like ads, or tracking cookies, and they can reduce the size to transfer. On the other hand simple passthrough proxies may be used to track and record all your activity.Isaiah
This glosses over the distinction between a "forward proxy" and a "reverse proxy". Forward proxies are commonly used for outbound traffic on a network. They interpret different headers sent by the client and must modify the request before forwarding to the target webserver. This answer seems to describe reverse proxies (AKA transparant proxy) which are designed to look like and behave like a webserver in their own right. These may not modify requests before passing forwards.Spry

© 2022 - 2024 — McMap. All rights reserved.