Source interface with Python and urllib2
Asked Answered
C

6

33

How do i set the source IP/interface with Python and urllib2?

Chimney answered 19/7, 2009 at 17:5 Comment(1)
It's reasonable to use "requests" library or pycurl. You always stumble over urllib2's bad design, if you use it for non-trivial tasks.Sidesaddle
C
48

Unfortunately the stack of standard library modules in use (urllib2, httplib, socket) is somewhat badly designed for the purpose -- at the key point in the operation, HTTPConnection.connect (in httplib) delegates to socket.create_connection, which in turn gives you no "hook" whatsoever between the creation of the socket instance sock and the sock.connect call, for you to insert the sock.bind just before sock.connect that is what you need to set the source IP (I'm evangelizing widely for NOT designing abstractions in such an airtight, excessively-encapsulated way -- I'll be speaking about that at OSCON this Thursday under the title "Zen and the Art of Abstraction Maintenance" -- but here your problem is how to deal with a stack of abstractions that WERE designed this way, sigh).

When you're facing such problems you only have two not-so-good solutions: either copy, paste and edit the misdesigned code into which you need to place a "hook" that the original designer didn't cater for; or, "monkey-patch" that code. Neither is GOOD, but both can work, so at least let's be thankful that we have such options (by using an open-source and dynamic language). In this case, I think I'd go for monkey-patching (which is bad, but copy and paste coding is even worse) -- a code fragment such as:

import socket
true_socket = socket.socket
def bound_socket(*a, **k):
    sock = true_socket(*a, **k)
    sock.bind((sourceIP, 0))
    return sock
socket.socket = bound_socket

Depending on your exact needs (do you need all sockets to be bound to the same source IP, or...?) you could simply run this before using urllib2 normally, or (in more complex ways of course) run it at need just for those outgoing sockets you DO need to bind in a certain way (then each time restore socket.socket = true_socket to get out of the way for future sockets yet to be created). The second alternative adds its own complications to orchestrate properly, so I'm waiting for you to clarify whether you do need such complications before explaining them all.

AKX's good answer is a variant on the "copy / paste / edit" alternative so I don't need to expand much on that -- note however that it doesn't exactly reproduce socket.create_connection in its connect method, see the source here (at the very end of the page) and decide what other functionality of the create_connection function you may want to embody in your copied/pasted/edited version if you decide to go that route.

Collocate answered 19/7, 2009 at 17:43 Comment(4)
not only a complete answer, but, maybe, the first example of good use of monkey patching I've ever seenArdolino
Tx @Roberto -- it's more of a "least of evils", but, yes, when faced with an abstraction that's sealed against your needs (missing the required hooks/"leaks"), monkeypatching may be "good" (in the same sense that pulling a tooth that's sick beyond repair is "good";-).Collocate
@Alex Martelli: Thanks. The monkey patch worked just fine! :)Chimney
@jonasl, yay! Always happy to help.Collocate
S
24

This seems to work.

import urllib2, httplib, socket

class BindableHTTPConnection(httplib.HTTPConnection):
    def connect(self):
        """Connect to the host and port specified in __init__."""
        self.sock = socket.socket()
        self.sock.bind((self.source_ip, 0))
        if isinstance(self.timeout, float):
            self.sock.settimeout(self.timeout)
        self.sock.connect((self.host,self.port))

def BindableHTTPConnectionFactory(source_ip):
    def _get(host, port=None, strict=None, timeout=0):
        bhc=BindableHTTPConnection(host, port=port, strict=strict, timeout=timeout)
        bhc.source_ip=source_ip
        return bhc
    return _get

class BindableHTTPHandler(urllib2.HTTPHandler):
    def http_open(self, req):
        return self.do_open(BindableHTTPConnectionFactory('127.0.0.1'), req)

opener = urllib2.build_opener(BindableHTTPHandler)
opener.open("http://google.com/").read() # Will fail, 127.0.0.1 can't reach google.com.

You'll need to figure out some way to parameterize "127.0.0.1" there, though.

Scalawag answered 19/7, 2009 at 17:36 Comment(4)
@DaveRawks: In which system had you got the success? I'm unable to bind network interface in Windows 7Fairish
Code works for me on linux and OSX. I've never written socket code on windows, but I suspect that windows' lack of raw sockets in user space could cause problems.Nora
I got <urlopen error [Errno 99] Cannot assign requested address> using your codeFucoid
This code works for me.. but i am still confused how it works? Can you give brief explanation for this code?Mascara
J
12

Here's a further refinement that makes use of HTTPConnection's source_address argument (introduced in Python 2.7):

import functools
import httplib
import urllib2

class BoundHTTPHandler(urllib2.HTTPHandler):

    def __init__(self, source_address=None, debuglevel=0):
        urllib2.HTTPHandler.__init__(self, debuglevel)
        self.http_class = functools.partial(httplib.HTTPConnection,
                source_address=source_address)

    def http_open(self, req):
        return self.do_open(self.http_class, req)

This gives us a custom urllib2.HTTPHandler implementation that is source_address aware. We can add it to a new urllib2.OpenerDirector and install it as the default opener (for future urlopen() calls) with the following code:

handler = BoundHTTPHandler(source_address=("192.168.1.10", 0))
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
Judson answered 3/2, 2013 at 3:16 Comment(1)
+1. This I think should be the recommended method with Python 2.7's updated httplib module. By the way, the IPv4 address string should be put in quotes. I tried editing your post but such small change could't pass stackoverflow's triviality filter.Cacophonous
P
2

I thought I'd follow up with a slightly better version of the monkey patch. If you need to be able to set different port options on some of the sockets or are using something like SSL that subclasses socket, the following code works a bit better.

_ip_address = None
def bind_outgoing_sockets_to_ip(ip_address):
    """This binds all python sockets to the passed in ip address"""
    global _ip_address
    _ip_address = ip_address

import socket
from socket import socket as s

class bound_socket(s):
    def connect(self, *args, **kwargs):
        if self.family == socket.AF_INET:
            if self.getsockname()[0] == "0.0.0.0" and _ip_address:                
                self.bind((_ip_address, 0))
        s.connect(self, *args, **kwargs)
socket.socket = bound_socket

You have to only bind the socket on connect if you need to run something like a webserver in the same process that needs to bind to a different ip address.

Parette answered 23/7, 2010 at 13:54 Comment(0)
A
2

Reasoning that I should monkey-patch at the highest level available, here's an alternative to Alex's answer which patches httplib instead of socket, taking advantage of httplib.HTTPSConnection.__init__()'s source_address keyword argument (which is not exposed by urllib2, AFAICT). Tested and working on Python 2.7.2.

import httplib
HTTPSConnection_real = httplib.HTTPSConnection
class HTTPSConnection_monkey(HTTPSConnection_real):
   def __init__(*a, **kw):
      HTTPSConnection_real.__init__(*a, source_address=(SOURCE_IP, 0), **kw)
httplib.HTTPSConnection = HTTPSConnection_monkey
Aragonite answered 2/4, 2012 at 17:8 Comment(0)
N
1

As of Python 2.7 httplib.HTTPConnection had source_address added to it, allowing you to provide an IP port pair to bind to.

See: http://docs.python.org/2/library/httplib.html#httplib.HTTPConnection

Novation answered 18/6, 2013 at 13:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.