Doctests fail with UnicodeDecodeError on C-extension and Python3
Asked Answered
C

1

6

I am having difficulty getting my testing framework to work for a C-extension module for both Python2 and Python3. I like to run my docstrings through doctest to make sure that I am not feeding my users bad information, so I want to run doctest as part of my testing.

I don't believe that the source of my problem is the docstrings themselves, but rather how the doctest module is trying to read my extension module. If I run doctest with Python2 (on the module compiled against Python2), I get the output that I expect:

$ python -m doctest myext.so -v
...
1 items passed all tests:
98 tests in myext.so
98 tests in 1 items.
98 passed and 0 failed.
Test passed.

However, when I do the same but with Python3, I get a UnicodeDecodeError:

$ python3 -m doctest myext3.so -v
Traceback (most recent call last):
...
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/doctest.py", line 223, in _load_testfile
    return f.read(), filename
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/codecs.py", line 301, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte

To get some more info, I ran it through pytest with full traceback:

$ python3 -m pytest --doctest-glob "*.so" --full-trace
...
self = <encodings.utf_8.IncrementalDecoder object at 0x102ff5110>
input = b'\xcf\xfa\xed\xfe\x07\x00\x00\x01\x03\x00\x00\x00\x08\x00\x00\x00\r\x00\x00\x00\xd0\x05\x00\x00\x85\x00\x00\x00\x00\x...edString\x00_PyUnicode_FromString\x00_Py_BuildValue\x00__Py_FalseStruct\x00__Py_TrueStruct\x00dyld_stub_binder\x00\x00'
final = True

    def decode(self, input, final=False):
        # decode input (taking the buffer into account)
        data = self.buffer + input
>       (result, consumed) = self._buffer_decode(data, self.errors, final)
E       UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte

/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/codecs.py:301: UnicodeDecodeError    

It looks like doctest is actually reading the .so file to get the docstrings (rather than importing the module), but Python3 doesn't know how to decode the input. I can confirm this by replicating the byte string and traceback by trying to read the .so file myself:

$ python3
Python 3.3.3 (default, Dec 10 2013, 20:13:18) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> open('myext3.so').read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/codecs.py", line 301, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte
>>> open('myext3.so', 'rb').read()
b'\xcf\xfa\xed\xfe\x07\x00\x00\x01\x03\x00\x00\x00\x08\x00\x00\x00\r\x00\x00\x00\xd0\x05...'

Has anyone else run into this problem before? Is there a standard (or not-so-standard) way to get doctest to execute tests on C extension modules on python3?

Update: I should also add that I get identical results on Travis-CI (see here), so it's not specific to my local build.

Chiachiack answered 5/8, 2014 at 5:18 Comment(4)
Perhaps there's a different version of doctest you need to compile for Python 3?Unscrew
For python3, I am using the doctest module from the std lib. Is there another version you recommend? Isn't doctest a pure python module, so does it need compilation?Chiachiack
Sorry, I've never used doctest before, didn't realize it was a built in module. Maybe your PYTHONPATH is putting the version 2 module ahead of the version 3 one?Unscrew
My PYTHONPATH is empty... I rely solely on the hardcoded paths in sys.path. Also, in the first traceback you can see that the doctest.py file is in the python 3.3 standard library location, so I don't think that is the problem. I appreciate your suggestions, keep them coming!Chiachiack
C
3

I have found a workaround to this problem so I will post it, but I find it rather unsatisfying. I am still looking for more elegant/less hacky solutions to this.


There are three problems with doctest.py that need to be overcome to make this work:

1) Get doctest to consider .so files as python modules.

If you look at the doctest.py source, you will notice in the test runner a block that looks similar to this (depending on the python version you are running):

if filename.endswith(".py"):
    # It is a module -- insert its dir into sys.path and try to
    # import it. If it is part of a package, that possibly
    # won't work because of package imports.
    dirname, filename = os.path.split(filename)
    sys.path.insert(0, dirname)
    m = __import__(filename[:-3])
    del sys.path[0]
    failures, _ = testmod(m)
else:
    failures, _ = testfile(filename, module_relative=False)

What is happening here is doctest.py is checking for the ".py" extension, and if so the file is loaded as a python module, but otherwise the file is read as if it were text (like a README.rst might be). We need to get doctest.py to acknowledge that a file with ".so" extension is a python module. To do this, simply add a check for the ".so" extension by modifying this if block to read

if filename.endswith(".py") or filename.endswith(".so"):
    ...

2) Get doctest to identify the functions in the C-extension module

doctest.py uses the inspect.isfunction function to determine what objects are functions when recursively searching for docstrings within a module object. The problem with this function is that it only identifies functions written in python, not in C (python identifies C-extension functions as builtin). So, to identify our functions when recursing through the module, we need to use inspect.isbuiltin instead.

To rectify this, we need to locate the DocTestFinder._find method in doctest.py and change how it looks for functions. I converted

# Recurse to functions & classes.
if ((inspect.isfunction(val) or inspect.isclass(val)) and
    self._from_module(module, val)):
    self._find(tests, val, valname, module, source_lines,
               globs, seen)

to

# Recurse to functions & classes.
if ((inspect.isbuiltin(val) or inspect.isclass(val)) and
    self._from_module(module, val)):
    self._find(tests, val, valname, module, source_lines,
               globs, seen)

3) Properly remove the version tag on the .so file (Python3 only).

On Python3, C-extensions can be tagged with a version identifier (i.e. "myext.cpython-3mu.so", please see PEP 3149). We need to know how to remove this when doing the initial import in the doctest.py test runner.

To do this, I converted the line

m = __import__(filename[:-3])

to

from sysconfig import get_config_var
m = __import__(filename[:-3] if filename.endswith(".py") else filename.replace(get_config_var("EXT_SUFFIX"), ""))

This is only needed for Python3.


After making these modifications, I can get doctest to work as expected on both Python2 and Python3. Since these modifications are rather annoying, I have made a patch_doctest.py script that does this automatically and puts the patched doctest.py in your current directory. You can get this file here if you want to use it. You can then run the tests on the extension modules like this

$ python2 patch_doctest.py
$ python2 -m doctest myext2.so
$ rm doctest.py
$ python3 patch_doctest.py
$ python3 -m doctest myext3.so

As evidence that this works, here are the new Travis-CI results.

Chiachiack answered 6/8, 2014 at 3:29 Comment(1)
Considering that it doesn't take much to get this to work, I wonder if this is an enhancement the python devs might be interested in...Chiachiack

© 2022 - 2024 — McMap. All rights reserved.