Add numpy.get_include() argument to setuptools without preinstalled numpy
Asked Answered
T

3

18

I am currently developing a python package that uses cython and numpy and I want the package to be installable using the pip install command from a clean python installation. All dependencies should be installed automatically. I am using setuptools with the following setup.py:

import setuptools

my_c_lib_ext = setuptools.Extension(
    name="my_c_lib",
    sources=["my_c_lib/some_file.pyx"]
)

setuptools.setup(
    name="my_lib",
    version="0.0.1",
    author="Me",
    author_email="[email protected]",
    description="Some python library",
    packages=["my_lib"],
    ext_modules=[my_c_lib_ext],
    setup_requires=["cython >= 0.29"],
    install_requires=["numpy >= 1.15"],
    classifiers=[
        "Programming Language :: Python :: 3",
        "Operating System :: OS Independent"
    ]
)

This has worked great so far. The pip install command downloads cython for the build and is able to build my package and install it together with numpy.

Now I want to improve the performance of my cython code, which leads to some changes in my setup.py. I need to add include_dirs=[numpy.get_include()] to either the call of setuptools.Extension(...) or setuptools.setup(...) which means that I also need to import numpy. (See http://docs.cython.org/en/latest/src/tutorial/numpy.html and Make distutils look for numpy header files in the correct place for rationals.)

This is bad. Now the user cannot call pip install from a clean environment, because import numpy will fail. The user needs to pip install numpy before installing my library. Even if I move "numpy >= 1.15" from install_requires to setup_requires the installation fails, because the import numpy is evaluated earlier.

Is there a way to evaluate the include_dirs at a later point of the installation, for example, after the dependencies from setup_requires or install_requires have been resolved? I really like to have all dependencies resolved automatically and I dont want the user to type multiple pip install commands.

The following snippet works, but it is not officially supported because it uses an undocumented (and private) method:

class NumpyExtension(setuptools.Extension):
    # setuptools calls this function after installing dependencies
    def _convert_pyx_sources_to_lang(self):
        import numpy
        self.include_dirs.append(numpy.get_include())
        super()._convert_pyx_sources_to_lang()

my_c_lib_ext = NumpyExtension(
    name="my_c_lib",
    sources=["my_c_lib/some_file.pyx"]
)

The article How to Bootstrap numpy installation in setup.py proposes using a cmdclass with custom build_ext class. Unfortunately, this breaks the build of the cython extension because cython also customizes build_ext.

Theresa answered 9/1, 2019 at 20:23 Comment(7)
Possible duplicate of How to Bootstrap numpy installation in setup.pyHustle
@Hustle This does not work because cython also customizes build_ext. If I use the proposed solution, the cython build fails with Don't know how to compile my_c_lib/some_file.pyx, which means that the customized build_ext command of cython is not used anymore.Theresa
I see it is not that straight forward...Hustle
Look at what pybind11 does here to defer the import - haven't tested but I think similar would work here.Cumbrance
The trick from pybind11 works after a small change. I had to inherit from os.PathLike instead of object, because cython requires a str, bytes, or PathLike object and str/bytes do not work because they are immutable. Do you want to create an answer from your comment oder should I self-answer my question?Theresa
@Cumbrance nice trick, but maybe a bit too clever: Should for example build_ext log options before requirements are installed, it will fail. On the other hand Command.finalize_options is the right place to handle such things - for a proper solution one should figure out, how it can work with Cython.Hustle
With recent enough pip, there might be solutions based on the use of the requires element of the [build-system] section of a pyproject.toml file. That file might also need to specify a "build backend". For instance: build-backend = "setuptools.build_meta".Letendre
H
19

First question, when is numpy needed? It is needed during the setup (i.e. when build_ext-funcionality is called) and in the installation, when the module is used. That means numpy should be in setup_requires and in install_requires.

There are following alternatives to solve the issue for the setup:

  1. using PEP 517/518 (which is more straight forward IMO)
  2. using setup_requires-argument of setup and postponing import of numpy until setup's requirements are satisfied (which is not the case at the start of setup.py's execution)

PEP 517/518-solution:

Put next to setup.py a pyproject.toml-file , with the following content:

[build-system]
requires = ["setuptools", "wheel", "Cython>=0.29", "numpy >= 1.15"]

which defines packages needed for building, and then install using pip install . in the folder with setup.py. A disadvantage of this method is that python setup.py install no longer works, as it is pip that reads pyproject.toml. However, I would use this approach whenever possible.


Postponing import

This approach is more complicated and somewhat hacky, but works also without pip.

First, let's take a look at unsuccessful tries so far:

pybind11-trick @chrisb's "pybind11"-trick, which can be found here: With help of an indirection, one delays the call to import numpy until numpy is present during the setup-phase, i.e.:

class get_numpy_include(object):

    def __str__(self):
        import numpy
        return numpy.get_include()
...
my_c_lib_ext = setuptools.Extension(
    ...
    include_dirs=[get_numpy_include()]
)

Clever! The problem: it doesn't work with the Cython-compiler: somewhere down the line, Cython passes the get_numpy_include-object to os.path.join(...,...) which checks whether the argument is really a string, which it obviously isn't.

This could be fixed by inheriting from str, but the above shows the dangers of the approach in the long run - it doesn't use the designed mechanics, is brittle and may easily fail in the future.

the classical build_ext-solution

Which looks as following:

...
from setuptools.command.build_ext import build_ext as _build_ext

class build_ext(_build_ext):
    def finalize_options(self):
        _build_ext.finalize_options(self)
        # Prevent numpy from thinking it is still in its setup process:
        __builtins__.__NUMPY_SETUP__ = False
        import numpy
        self.include_dirs.append(numpy.get_include())

setupttools.setup(
    ...
    cmdclass={'build_ext':build_ext},
    ...
)

Yet also this solution doesn't work with cython-extensions, because pyx-files don't get recognized.

The real question is, how did pyx-files get recognized in the first place? The answer is this part of setuptools.command.build_ext:

...
try:
    # Attempt to use Cython for building extensions, if available
    from Cython.Distutils.build_ext import build_ext as _build_ext
    # Additionally, assert that the compiler module will load
    # also. Ref #1229.
    __import__('Cython.Compiler.Main')
except ImportError:
    _build_ext = _du_build_ext
...

That means setuptools tries to use the Cython's build_ext if possible, and because the import of the module is delayed until build_ext is called, it founds Cython present.

The situation is different when setuptools.command.build_ext is imported at the beginning of the setup.py - the Cython isn't yet present and a fall back without cython-functionality is used.

mixing up pybind11-trick and classical solution

So let's add an indirection, so we don't have to import setuptools.command.build_ext directly at the beginning of setup.py:

....
# factory function
def my_build_ext(pars):
     # import delayed:
     from setuptools.command.build_ext import build_ext as _build_ext#
 
     # include_dirs adjusted: 
     class build_ext(_build_ext):
         def finalize_options(self):
             _build_ext.finalize_options(self)
             # Prevent numpy from thinking it is still in its setup process:
             __builtins__.__NUMPY_SETUP__ = False
             import numpy
             self.include_dirs.append(numpy.get_include())
     
    #object returned:
    return build_ext(pars)
...
setuptools.setup(
    ...
    cmdclass={'build_ext' : my_build_ext},
    ...
)
Hustle answered 10/1, 2019 at 23:23 Comment(3)
Subclassing build_ext is actually a pretty neat idea - this led me to another proposal, subclassing build instead of build_ext. Not that it would be less of a workaround (I'd prefer custom finalizing of build_ext too), but meh.Potman
I was excited because your pyproject.toml solution seems much simpler, but when I tried to use it I still run into the issue where numpy can't be imported because it isn't installed yet when setup.py is processed. My pyproject.toml: ```Talie
@DanBlanchard you have to use pip for the installation in this case.Hustle
P
4

One (hacky) suggestion would be using the fact that extension.include_dirs is first requested in build_ext, which is called after the setup dependencies are downloaded.

class MyExt(setuptools.Extension):
    def __init__(self, *args, **kwargs):
        self.__include_dirs = []
        super().__init__(*args, **kwargs)

    @property
    def include_dirs(self):
        import numpy
        return self.__include_dirs + [numpy.get_include()]

    @include_dirs.setter
    def include_dirs(self, dirs):
        self.__include_dirs = dirs


my_c_lib_ext = MyExt(
    name="my_c_lib",
    sources=["my_c_lib/some_file.pyx"]
)

setup(
    ...,
    setup_requires=['cython', 'numpy'],
)

Update

Another (less, but I guess still pretty hacky) solution would be overriding build instead of build_ext, since we know that build_ext is a subcommand of build and will always be invoked by build on installation. This way, we don't have to touch build_ext and leave it to Cython. This will also work when invoking build_ext directly (e.g., via python setup.py build_ext to rebuild the extensions inplace while developing) because build_ext ensures all options of build are initialized, and by coincidence, Command.set_undefined_options first ensures the command has finalized (I know, distutils is a mess).

Of course, now we're misusing build - it runs code that belongs to build_ext finalization. However, I'd still probably go with this solution rather than with the first one, ensuring the relevant piece of code is properly documented.

import setuptools
from distutils.command.build import build as build_orig


class build(build_orig):

    def finalize_options(self):
        super().finalize_options()
        # I stole this line from ead's answer:
        __builtins__.__NUMPY_SETUP__ = False
        import numpy
        # or just modify my_c_lib_ext directly here, ext_modules should contain a reference anyway
        extension = next(m for m in self.distribution.ext_modules if m == my_c_lib_ext)
        extension.include_dirs.append(numpy.get_include())


my_c_lib_ext = setuptools.Extension(
    name="my_c_lib",
    sources=["my_c_lib/some_file.pyx"]
)

setuptools.setup(
    ...,
    ext_modules=[my_c_lib_ext],
    cmdclass={'build': build},
    ...
)
Potman answered 10/1, 2019 at 12:8 Comment(3)
This is the solution that worked for me, I wanted to both pip install and run python setup.py build_ext.Crandell
In latest python versions you may need to replace __builtins__.__NUMPY_SETUP__ = False with import builtins; builtins.__NUMPY_SETUP__ = FalseRepressive
Is it possible to do something similar with Cython import when cythonize is required in ext_modules=... of setup()?Grillparzer
F
2

I found a very easy solution in this post:

Or you can stick to https://github.com/pypa/pip/issues/5761. Here you install cython and numpy using setuptools.dist before actual setup:

from setuptools import dist
dist.Distribution().fetch_build_eggs(['Cython>=0.15.1', 'numpy>=1.10'])

Works well for me!

Fitzgerald answered 18/3, 2020 at 13:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.