Including and distributing third party libraries with a Python C extension
Asked Answered
B

1

7

I'm building a C Python extension which makes use of a "third party" library— in this case, one that I've built using a separate build process and toolchain. Call this library libplumbus.dylib.

Directory structure would be:

grumbo/
  include/
    plumbus.h
  lib/
    libplumbus.so
  grumbo.c
  setup.py

My setup.py looks approximately like:

from setuptools import Extension, setup

native_module = Extension(
    'grumbo',
    define_macros = [('MAJOR_VERSION', '1'),
                     ('MINOR_VERSION', '0')],
    sources       = ['grumbo.c'],
    include_dirs  = ['include'],
    libraries     = ['plumbus'],
    library_dirs  = ['lib'])


setup(
    name = 'grumbo',
    version = '1.0',
    ext_modules = [native_module] )

Since libplumbus is an external library, when I run import grumbo I get:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: dlopen(/path/to/grumbo/grumbo.cpython-37m-darwin.so, 2): Library not loaded: lib/libplumbus.dylib
  Referenced from: /path/to/grumbo/grumbo.cpython-37m-darwin.so
  Reason: image not found

What's the simplest way to set things up so that libplumbus is included with the distribution and properly loaded when grumbo is imported? (Note that this should work with a virtualenv).

I have tried adding lib/libplumbus.dylib to package_data, but this doesn't work, even if I add -Wl,-rpath,@loader_path/grumbo/lib to the Extension's extra_link_args.

Barnie answered 9/9, 2020 at 4:42 Comment(0)
H
14

The goal of this post is to have a setup.py which would create a source distribution. That means after running

python setup.py sdist

the resulting dist/grumbo-1.0.tar.gz could be used for installation via

pip install grumbo-1.0.tar.gz

We will start for a setup.py for Linux/MacOS, but then tweak to make it work for Windows as well.


The first step is to get the additional data (includes/library) into the distribution. I'm not sure it is really impossible to add data for a module, but setuptools offers functionality to add data for packages, so let's make a package from your module (which is probably a good idea anyway).

The new structure of package grumbo looks as follows:

src/
  grumbo/
     __init__.py  # empty
     grumbo.c
     include/
       plumbus.h
     lib/
       libplumbus.so
setup.py

and changed setup.py:

from setuptools import setup, Extension, find_packages

native_module = Extension(
                name='grumbo.grumbo',
                sources = ["src/grumbo/grumbo.c"],
              )
kwargs = {
      'name' : 'grumbo',
      'version' : '1.0',
      'ext_modules' :  [native_module],
      'packages':find_packages(where='src'),
      'package_dir':{"": "src"},
}

setup(**kwargs)

It doesn't do much yet, but at least our package can be found by setuptools. The build fails, because the includes are missing.

Now let's add the needed includes from the include-folder to the distribution via package-data:

...
kwargs = {
      ...,
      'package_data' : { 'grumbo': ['include/*.h']},
}
...

With that our include-files are copied to the source distribution. However because it will be build "somewhere" we don't know yet, adding include_dirs = ['include'] to the Extension definition just doesn't cut it.

There must be a better way (and less brittle) to find the right include path, but that is what I came up with:

...
import os
import sys
import sysconfig
def path_to_build_folder():
    """Returns the name of a distutils build directory"""
    f = "{dirname}.{platform}-{version[0]}.{version[1]}"
    dir_name = f.format(dirname='lib',
                    platform=sysconfig.get_platform(),
                    version=sys.version_info)
    return os.path.join('build', dir_name, 'grumbo')

native_module = Extension(
                ...,
                include_dirs  = [os.path.join(path_to_build_folder(),'include')],
)
...

Now, the extension is built, but cannot be yet loaded because it is not linked against shared-object libplumbus.so and thus some symbols are unresolved.

Similar to the header files, we can add our library to the distribution:

kwargs = {
          ...,
          'package_data' : { 'grumbo': ['include/*.h', 'lib/*.so']},
}
...

and add the right lib-path for the linker:

...
native_module = Extension(
                ...
                libraries     = ['plumbus'],
                library_dirs  = [os.path.join(path_to_build_folder(), 'lib')],
              )
...

Now, we are almost there:

  • the extension is built an put into site-packages/grumbo/
  • the extension depends on libplumbus.so as can be seen with help of ldd
  • libplumbus.so is put into site-packages/grumbo/lib

However, we still cannot import the extension, as import grumbo.grumbo leads to

ImportError: libplumbus.so: cannot open shared object file: No such file or directory

because the loader cannot find the needed shared object which resides in the folder .\lib relative to our extension. We could use rpath to "help" the loader:

...
native_module = Extension(
                ...
                extra_link_args = ["-Wl,-rpath=$ORIGIN/lib/."],
              )
...

And now we are done:

>>> import grumbo.grumbo
# works!

Also building and installing a wheel should work:

python setup.py bdist_wheel

and then:

pip install grumbo-1.0-xxxx.whl

The first mile stone is achieved. Now we extend it, so it works other platforms as well.


Same source distribution for Linux and Macos:

To be able to install the same source distribution on Linux and MacOS, both versions of the shared library (for Linux and MacOS) must be present. An option is to add a suffix to the names of shared objects: e.g. having libplumbus.linux.so and libplumbis.macos.so. The right shared object can be picked in the setup.py depending on the platform:

...
import platform
def pick_library():
    my_system = platform.system()
    if my_system == 'Linux':
        return "plumbus.linux"
    if my_system == 'Darwin':
        return "plumbus.macos"
    if my_system == 'Windows':
        return "plumbus"
    raise ValueError("Unknown platform: " + my_system)

native_module = Extension(
                ...
                libraries     = [pick_library()],
                ...
              )

Tweaking for Windows:

On Windows, dynamic libraries are dlls and not shared objects, so there are some differences that need to be taken into account:

  • when the C-extension is built, it needs plumbus.lib-file, which we need to put into the lib-subfolder.
  • when the C-extension is loaded during the run time, it needs plumbus.dll-file.
  • Windows has no notion of rpath, thus we need to put the dll right next to the extension, so it can be found (see also this SO-post for more details).

That means the folder structure should be as follows:

src/
  grumbo/
     __init__.py
     grumbo.c
     plumbus.dll           # needed for Windows
     include/
       plumbus.h
     lib/
       libplumbus.linux.so # needed on Linux
       libplumbus.macos.so # needed on Macos
       plumbus.lib         # needed on Windows
setup.py

There are also some changes in the setup.py. First, extending the package_data so dll and lib are picked up:

...
kwargs = {
      ...
      'package_data' : { 'grumbo': ['include/*.h', 'lib/*.so',
                                    'lib/*.lib', '*.dll',      # for windows
                                   ]},
}
...

Second, rpath can only be used on Linux/MacOS, thus:

def get_extra_link_args():
    if platform.system() == 'Windows':
        return []
    else:
        return ["-Wl,-rpath=$ORIGIN/lib/."]
    

native_module = Extension(
                ...
                extra_link_args = get_extra_link_args(),
              )

That it!


The complete setup file (you might want to add macro-definition or similar, which I've skipped):

from setuptools import setup, Extension, find_packages

import os
import sys
import sysconfig
def path_to_build_folder():
    """Returns the name of a distutils build directory"""
    f = "{dirname}.{platform}-{version[0]}.{version[1]}"
    dir_name = f.format(dirname='lib',
                    platform=sysconfig.get_platform(),
                    version=sys.version_info)
    return os.path.join('build', dir_name, 'grumbo')


import platform
def pick_library():
    my_system = platform.system()
    if my_system == 'Linux':
        return "plumbus.linux"
    if my_system == 'Darwin':
        return "plumbus.macos"
    if my_system == 'Windows':
        return "plumbus"
    raise ValueError("Unknown platform: " + my_system)


def get_extra_link_args():
    if platform.system() == 'Windows':
        return []
    else:
        return ["-Wl,-rpath=$ORIGIN/lib/."]
    

native_module = Extension(
                name='grumbo.grumbo',
                sources = ["src/grumbo/grumbo.c"],
                include_dirs  = [os.path.join(path_to_build_folder(),'include')],
                libraries     = [pick_library()],
                library_dirs  = [os.path.join(path_to_build_folder(), 'lib')],
                extra_link_args = get_extra_link_args(),
              )
kwargs = {
      'name' : 'grumbo',
      'version' : '1.0',
      'ext_modules' :  [native_module],
      'packages':find_packages(where='src'),
      'package_dir':{"": "src"},
      'package_data' : { 'grumbo': ['include/*.h', 'lib/*.so',
                                    'lib/*.lib', '*.dll',      # for windows
                                   ]},
}

setup(**kwargs)
Hangchow answered 10/9, 2020 at 21:17 Comment(1)
This answer was great! This path_to_build_folder() function shown isn't working for me with not work with pip -e . builds, so I added include_dir paths twice to include the location in the repo as well as the build folder path.Nucleate

© 2022 - 2024 — McMap. All rights reserved.