Doctests fail with UnicodeDecodeError on C-extension and Python3

$ python3 -m doctest myext3.so -v Traceback (most recent call last): ... File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 223, in _load_testfile return f.read(), filename File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/codecs.py", line 301, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte

$ python3 -m pytest --doctest-glob "*.so" --full-trace ... self = <encodings.utf_8.IncrementalDecoder object at 0x102ff5110> input = b'\xcf\xfa\xed\xfe\x07\x00\x00\x01\x03\x00\x00\x00\x08\x00\x00\x00\r\x00\x00\x00\xd0\x05\x00\x00\x85\x00\x00\x00\x00\x...edString\x00_PyUnicode_FromString\x00_Py_BuildValue\x00__Py_FalseStruct\x00__Py_TrueStruct\x00dyld_stub_binder\x00\x00' final = True def decode(self, input, final=False): # decode input (taking the buffer into account) data = self.buffer + input > (result, consumed) = self._buffer_decode(data, self.errors, final) E UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte /usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/codecs.py:301: UnicodeDecodeError

$ python3 Python 3.3.3 (default, Dec 10 2013, 20:13:18) [GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> open('myext3.so').read() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/codecs.py", line 301, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte >>> open('myext3.so', 'rb').read() b'\xcf\xfa\xed\xfe\x07\x00\x00\x01\x03\x00\x00\x00\x08\x00\x00\x00\r\x00\x00\x00\xd0\x05...'

I have found a workaround to this problem so I will post it, but I find it rather unsatisfying. I am still looking for more elegant/less hacky solutions to this.

There are three problems with doctest.py that need to be overcome to make this work:

1) Get doctest to consider .so files as python modules.

If you look at the doctest.py source, you will notice in the test runner a block that looks similar to this (depending on the python version you are running):

if filename.endswith(".py"):
    # It is a module -- insert its dir into sys.path and try to
    # import it. If it is part of a package, that possibly
    # won't work because of package imports.
    dirname, filename = os.path.split(filename)
    sys.path.insert(0, dirname)
    m = __import__(filename[:-3])
    del sys.path[0]
    failures, _ = testmod(m)
else:
    failures, _ = testfile(filename, module_relative=False)

What is happening here is doctest.py is checking for the ".py" extension, and if so the file is loaded as a python module, but otherwise the file is read as if it were text (like a README.rst might be). We need to get doctest.py to acknowledge that a file with ".so" extension is a python module. To do this, simply add a check for the ".so" extension by modifying this if block to read

if filename.endswith(".py") or filename.endswith(".so"):
    ...

2) Get doctest to identify the functions in the C-extension module

doctest.py uses the inspect.isfunction function to determine what objects are functions when recursively searching for docstrings within a module object. The problem with this function is that it only identifies functions written in python, not in C (python identifies C-extension functions as builtin). So, to identify our functions when recursing through the module, we need to use inspect.isbuiltin instead.

To rectify this, we need to locate the DocTestFinder._find method in doctest.py and change how it looks for functions. I converted

# Recurse to functions & classes.
if ((inspect.isfunction(val) or inspect.isclass(val)) and
    self._from_module(module, val)):
    self._find(tests, val, valname, module, source_lines,
               globs, seen)

# Recurse to functions & classes.
if ((inspect.isbuiltin(val) or inspect.isclass(val)) and
    self._from_module(module, val)):
    self._find(tests, val, valname, module, source_lines,
               globs, seen)

3) Properly remove the version tag on the .so file (Python3 only).

On Python3, C-extensions can be tagged with a version identifier (i.e. "myext.cpython-3mu.so", please see PEP 3149). We need to know how to remove this when doing the initial import in the doctest.py test runner.

To do this, I converted the line

m = __import__(filename[:-3])

from sysconfig import get_config_var
m = __import__(filename[:-3] if filename.endswith(".py") else filename.replace(get_config_var("EXT_SUFFIX"), ""))

This is only needed for Python3.

After making these modifications, I can get doctest to work as expected on both Python2 and Python3. Since these modifications are rather annoying, I have made a patch_doctest.py script that does this automatically and puts the patched doctest.py in your current directory. You can get this file here if you want to use it. You can then run the tests on the extension modules like this

$ python2 patch_doctest.py
$ python2 -m doctest myext2.so
$ rm doctest.py
$ python3 patch_doctest.py
$ python3 -m doctest myext3.so

As evidence that this works, here are the new Travis-CI results.

Recommended topics

Hot tags