How do I unit test a module that relies on urllib2?

Asked 16/2, 2010 at 22:2 Answered 31/12, 2016 at 15:33

Solved python unit-testing urllib2 urllib

I've got a piece of code that I can't figure out how to unit test! The module pulls content from external XML feeds (twitter, flickr, youtube, etc.) with urllib2. Here's some pseudo-code for it:

params = (url, urlencode(data),) if data else (url,)
req = Request(*params)
response = urlopen(req)
#check headers, content-length, etc...
#parse the response XML with lxml...

My first thought was to pickle the response and load it for testing, but apparently urllib's response object is unserializable (it raises an exception).

Just saving the XML from the response body isn't ideal, because my code uses the header information too. It's designed to act on a response object.

And of course, relying on an external source for data in a unit test is a horrible idea.

So how do I write a unit test for this?

Glaive answered 16/2, 2010 at 22:2 Comment(1)

Related: #295938 – Sharasharai 28/3, 2011 at 4:54

urllib2 has a functions called build_opener() and install_opener() which you should use to mock the behaviour of urlopen()

import urllib2
from StringIO import StringIO

def mock_response(req):
    if req.get_full_url() == "http://example.com":
        resp = urllib2.addinfourl(StringIO("mock file"), "mock message", req.get_full_url())
        resp.code = 200
        resp.msg = "OK"
        return resp

class MyHTTPHandler(urllib2.HTTPHandler):
    def http_open(self, req):
        print "mock opener"
        return mock_response(req)

my_opener = urllib2.build_opener(MyHTTPHandler)
urllib2.install_opener(my_opener)

response=urllib2.urlopen("http://example.com")
print response.read()
print response.code
print response.msg

Gauthier answered 16/2, 2010 at 22:30 Comment(3)

This is pretty cool, I didn't actually know urllib2 let you install alternate openers. The only issue I see is that it means you've changed global shared state, this means any subsequent calls to urllib2.urlopen will use your handler unless you re-register the old one (which is fine if you're running individual tests, but can cause issues in a test suite when various tests can effect the results of subsequent tests.) – Blaise 16/2, 2010 at 22:56

@Crast, the idea is to change the global behaviour because the call to urlopen might be deep down in the module somewhere. It would be simple enough to pass requests you are not interested in on to the HTTPHandler. In most cases I would be using the same mock opener for an entire test suite and then reinstall the original. – Gauthier 16/2, 2010 at 23:12

@Crast, there is a way to uninstall a previously installed opener. It's not documented, though. Before you install a mock opener, you want to save the old one with oldOpener = urllib2._opener. Then, in your unit test teardown() you install it back with urllib2.install_opener(oldOpener). – Medina 1/12, 2014 at 19:52

It would be best if you could write a mock urlopen (and possibly Request) which provides the minimum required interface to behave like urllib2's version. You'd then need to have your function/method which uses it able to accept this mock urlopen somehow, and use urllib2.urlopen otherwise.

This is a fair amount of work, but worthwhile. Remember that python is very friendly to ducktyping, so you just need to provide some semblance of the response object's properties to mock it.

For example:

class MockResponse(object):
    def __init__(self, resp_data, code=200, msg='OK'):
        self.resp_data = resp_data
        self.code = code
        self.msg = msg
        self.headers = {'content-type': 'text/xml; charset=utf-8'}

    def read(self):
        return self.resp_data

    def getcode(self):
        return self.code

    # Define other members and properties you want

def mock_urlopen(request):
    return MockResponse(r'<xml document>')

Granted, some of these are difficult to mock, because for example I believe the normal "headers" is an HTTPMessage which implements fun stuff like case-insensitive header names. But, you might be able to simply construct an HTTPMessage with your response data.

Blaise answered 16/2, 2010 at 22:8 Comment(2)

I really appreciate your code, but the test double class for communicating with the server that Randolpho suggested is gonna be a better fit for this case. You've been a big help though! Thanks! – Glaive 17/2, 2010 at 0:49

No offense taken, I agree it's a better solution too :} See my comment on Randolpho 's post. – Blaise 17/2, 2010 at 2:31

Build a separate class or module responsible for communicating with your external feeds.

Make this class able to be a test double. You're using python, so you're pretty golden there; if you were using C#, I'd suggest either in interface or virtual methods.

In your unit test, insert a test double of the external feed class. Test that your code uses the class correctly, assuming that the class does the work of communicating with your external resources correctly. Have your test double return fake data rather than live data; test various combinations of the data and of course the possible exceptions urllib2 could throw.

Aand... that's it.

You can't effectively automate unit tests that rely on external sources, so you're best off not doing it. Run an occasional integration test on your communication module, but don't include those tests as part of your automated tests.

Edit:

Just a note on the difference between my answer and @Crast's answer. Both are essentially correct, but they involve different approaches. In Crast's approach, you use a test double on the library itself. In my approach, you abstract the use of the library away into a separate module and test double that module.

Which approach you use is entirely subjective; there's no "correct" answer there. I prefer my approach because it allows me to build more modular, flexible code, something I value. But it comes at a cost in terms of additional code to write, something that may not be valued in many agile situations.

Mclain answered 16/2, 2010 at 22:9 Comment(2)

Re. Your edit: For what it's worth, I'd actually go with your approach too (some sort of url-getter class) if I were writing the code from scratch. I prefer writing minimal interfaces with only a subset of guaranteed properties on the return, so they are easier to make test doubles of. Also, it makes dependency injection more explicit. – Blaise 16/2, 2010 at 22:20

Would like to vote this up a few more times. This approach will make mocking for testing tons easier than mocking something in the standard library! – Sharasharai 28/3, 2011 at 4:14

You can use pymox to mock the behavior of anything and everything in the urllib2 (or any other) package. It's 2010, you shouldn't be writing your own mock classes.

Ethylethylate answered 16/2, 2010 at 22:38 Comment(2)

+1 for the "It's 2010...", though pymox looks interesting too. – Glaive 16/2, 2010 at 22:39

In 2017 there is unittest.mock, which is included in the Python3 distribution. – Postobit 12/5, 2017 at 19:22

I think the easiest thing to do is to actually create a simple web server in your unit test. When you start the test, create a new thread that listens on some arbitrary port and when a client connects just returns a known set of headers and XML, then terminates.

I can elaborate if you need more info.

Here's some code:

import threading, SocketServer, time

# a request handler
class SimpleRequestHandler(SocketServer.BaseRequestHandler):
    def handle(self):
        data = self.request.recv(102400) # token receive
        senddata = file(self.server.datafile).read() # read data from unit test file
        self.request.send(senddata)
        time.sleep(0.1) # make sure it finishes receiving request before closing
        self.request.close()

def serve_data(datafile):
    server = SocketServer.TCPServer(('127.0.0.1', 12345), SimpleRequestHandler)
    server.datafile = datafile
    http_server_thread = threading.Thread(target=server.handle_request())

To run your unit test, call serve_data() then call your code that requests a URL that looks like http://localhost:12345/anythingyouwant.

Drews answered 16/2, 2010 at 22:24 Comment(3)

I considered that but it seemed... unpleasant at best. If you think you could provide code simpler than the suggestions above to use a test-double for the communication class, please do. I'd love to see a quick and dirty server like that just for my own education. – Glaive 16/2, 2010 at 22:29

It wouldn't be a unit test if it relied on a website. Go with a mock object which is easy in Python. – Vaasta 16/2, 2010 at 22:38

OK, I've added some simple code. I doubt you could do anything much simpler than this. – Drews 17/2, 2010 at 1:6

Why not just mock a website that returns the response you expect? then start the server in a thread in setup and kill it in the teardown. I ended up doing this for testing code that would send email by mocking an smtp server and it works great. Surely something more trivial could be done for http...

from smtpd import SMTPServer
from time import sleep
import asyncore
SMTP_PORT = 6544

class MockSMTPServer(SMTPServer):
    def __init__(self, localaddr, remoteaddr, cb = None):
        self.cb = cb
        SMTPServer.__init__(self, localaddr, remoteaddr)

    def process_message(self, peer, mailfrom, rcpttos, data):
        print (peer, mailfrom, rcpttos, data)
        if self.cb:
            self.cb(peer, mailfrom, rcpttos, data)
        self.close()

def start_smtp(cb, port=SMTP_PORT):

    def smtp_thread():
        _smtp = MockSMTPServer(("127.0.0.1", port), (None, 0), cb)
        asyncore.loop()
        return Thread(None, smtp_thread)


def test_stuff():
        #.......snip noise
        email_result = None

        def email_back(*args):
            email_result = args

        t = start_smtp(email_back)
        t.start()
        sleep(1)

        res.form["email"]= self.admin_email
        res = res.form.submit()
        assert res.status_int == 302,"should've redirected"


        sleep(1)
        assert email_result is not None, "didn't get an email"

Ehlke answered 16/2, 2010 at 22:33 Comment(0)

Trying to improve a bit on @john-la-rooy answer, I've made a small class allowing simple mocking for unit tests

Should work with python 2 and 3

try:
    import urllib.request as urllib
except ImportError:
    import urllib2 as urllib

from io import BytesIO


class MockHTTPHandler(urllib.HTTPHandler):

    def mock_response(self, req):
        url = req.get_full_url()

        print("incomming request:", url)

        if url.endswith('.json'):
            resdata = b'[{"hello": "world"}]'
            headers = {'Content-Type': 'application/json'}

            resp = urllib.addinfourl(BytesIO(resdata), header, url, 200)
            resp.msg = "OK"

            return resp
        raise RuntimeError('Unhandled URL', url)
    http_open = mock_response


    @classmethod
    def install(cls):
        previous = urllib._opener
        urllib.install_opener(urllib.build_opener(cls))
        return previous

    @classmethod
    def remove(cls, previous=None):
        urllib.install_opener(previous)

Used like this:

class TestOther(unittest.TestCase):

    def setUp(self):
        previous = MockHTTPHandler.install()
        self.addCleanup(MockHTTPHandler.remove, previous)

Astragal answered 31/12, 2016 at 15:33 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags