How to write a proxy pool server (when a request comes, choose a proxy to get url content) in python?
Asked Answered
C

1

18

I do not know what the proper name is for such proxy server, you're welcome to fix my question title.

When I search proxy server on google, a lot implements like maproxy or a-python-proxy-in-less-than-100-lines-of-code. Those proxies server seems just ask remote server to get a certain url address.

I want to build a proxy server, which contains a proxy pool(a list of http/https proxies) and only have one IP address and one port to serve incoming requests. When a request comes, it would choose a proxy from the pool and do this request, and return result back.

For example I have a VPS which IP '192.168.1.66'. I start proxy server at this VPS with IP '127.0.0.1' and port '8080'.

I can then use this proxy like below.

import requests
url = 'http://www.google.com'
headers = {
    ...
}
proxies = {
    'http': 'http://192.168.1.66:8080'
}

r = requests.get(url, headers=headers, proxies=proxies)

I have see some impelement like:

from twisted.web import proxy, http
from twisted.internet import reactor
from twisted.python import log
import sys
log.startLogging(sys.stdout)

class ProxyFactory(http.HTTPFactory):
    protocol = proxy.Proxy

reactor.listenTCP(8080, ProxyFactory())
reactor.run()

It works, but it is so simple that I have no idea how it works and how to improve this code to use a proxy pool.

An example flow :

from hidu/proxy-manager , which write by golang .

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
+ client (want visit http://www.baidu.com/)              +  
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
                        |  
                        |  via proxy 127.0.0.1:8090  
                        |  
                        V  
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
+                       +         proxy pool             +  
+ proxy manager listen  ++++++++++++++++++++++++++++++++++  
+ on (127.0.0.1:8090)   +  http_proxy1,http_proxy2,      +  
+                       +  socks5_proxy1,socks5_proxy2   +  
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
                        |  
                        |  choose one proxy visit 
                        |  www.baidu.com  
                        |  
                        V  
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
+        site:www.baidu.com                              +  
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++  
Cathryncathy answered 15/10, 2015 at 4:4 Comment(3)
Do you need it to have some practice in python or you've just have a task and you decided to implement it in python? Will an open-source solution that doesn't require any coding do for you task?Rentier
@ffeast For task or work I can just use squid or goproxy, but learn a python implement is also meaningful .Cathryncathy
Did you manage to solve the task?Mover
N
3

Your Proxy Pool concept is not hard to implement. If I understand correctly, you want to make following.

  1. YOUR PROXY SERVER listening requests on 192.168.1.66:8080
  2. CLIENT requests to access http://www.google.com
  3. YOUR PROXY SERVER sends CLIENT's request to ANOTHER PROXY SERVER, which is in list of ANOTHER PROXY SERVER - PROXY POOL.
  4. YOUR PROXY SERVER gets response from ANOTHER PROXY SERVER, and respond to CLIENT

So, I've write simple proxy server using Flask and Requests.

from flask import Flask, Response
import random

app = Flask(__name__)

@app.route('/p/<path:url>')
def proxy(url):
    """ Request to this like /p/www.google.com
    """
    url = 'http://{}'.format(url)
    r = get_response(url)

    return Response(stream_with_context(r.iter_content()), 
                    content_type=r.headers['content-type'])

def get_proxy():
    # This is your "Proxy Pool"
    proxies = [
        'http://proxy-server-1.com',
        'http://proxy-server-2.com',
        'http://proxy-server-3.com',
    ]

    return random.choice(proxies)

def get_response(target_url):
    proxy = get_proxy();
    url = "{}/p/{}".format(proxy, target_url)
    # Above line will generate like http://proxy-server-1.com/p/www.google.com

    return requests.get(url, stream=True)

if __name__ == '__main__':
    app.run()

Then, you can start here to improve your proxy server.

Common Proxy Pool, or Proxy Manager can check availability, speed, and more stats of it's proxies, and select best proxy to send request. And of course, this example handle only simple request, and you can add features handle request args, methods, protocols.

Hope this helpful!

Nusku answered 16/10, 2015 at 7:54 Comment(6)
I think this kind of implement is not a common proxy.Because I can't use it by r = requests.get(url, headers=headers, proxies=proxies), and common web browser(chrome ,firefox, ie) can not use this proxy too.Cathryncathy
Actually, you can access this using your common web browser, like 192.168.1.66:8080/p/www.google.com. Yes above sample is not to implement common proxy, but to implement simple idea - Make Proxy that uses Proxy.Nusku
That way is just to view the result, not use it! I have describe the flow in question, I know the flow, but I don't know how build it. What you mention requests is over http, do not transparent transport data.A proxy should be transparent, such as don't care income request is http post or http get. These days, I see all python proxy server use socks. But none of them use socks over another proxy, this is the hardest thing.Cathryncathy
@mromo how many concurrent connections is this solution expected to handle?Rentier
Old question has awaken! so, @ffeast I don't know, but simple flask and requests can handle quite a lot connections under uwsgi. so if you interested, you should check on your environment.Nusku
@Cathryncathy Using proxy on socks is not that hard, just read data from client, and send it to another proxy server. And "transparent transport" part you mentioned is actually not transparent. If proxy passes a data to another, proxy need to know protocol, method, and host to build http packet. If proxy doesn't know about that information, how it can communicate between client and destnation server?Nusku

© 2022 - 2024 — McMap. All rights reserved.