Package only binary compiled .so files of a python library compiled with Cython
Asked Answered
E

4

15

I have a package named mypack which inside has a module mymod.py, and the __init__.py. For some reason that is not in debate, I need to package this module compiled (nor .py or .pyc files are allowed). That is, the __init__.py is the only source file allowed in the distributed compressed file.

The folder structure is:

. 
│  
├── mypack
│   ├── __init__.py
│   └── mymod.py
├── setup.py

I find that Cython is able to do this, by converting each .py file in a .so library that can be directly imported with python.

The question is: how the setup.py file must be in order to allow an easy packaging and installation?

The target system has a virtualenv where the package must be installed with whatever method that allows easy install and uninstall (easy_install, pip, etc are all welcome).

I tried all that was at my reach. I read setuptools and distutils documentation, all stackoverflow related questions, and tried with all kind of commands (sdist, bdist, bdist_egg, etc), with lots of combinations of setup.cfg and MANIFEST.in file entries.

The closest I got was with the below setup file, that would subclass the bdist_egg command in order to remove also .pyc files, but that is breaking the installation.

A solution that installs "manually" the files in the venv is also good, provided that all ancillary files that are included in a proper installation are covered (I need to run pip freeze in the venv and see mymod==0.0.1).

Run it with:

python setup.py bdist_egg --exclude-source-files

and (try to) install it with

easy_install mymod-0.0.1-py2.7-linux-x86_64.egg

As you may notice, the target is linux 64 bits with python 2.7.

from Cython.Distutils import build_ext
from setuptools import setup, find_packages
from setuptools.extension import Extension
from setuptools.command import bdist_egg
from setuptools.command.bdist_egg import  walk_egg, log 
import os

class my_bdist_egg(bdist_egg.bdist_egg):

    def zap_pyfiles(self):
        log.info("Removing .py files from temporary directory")
        for base, dirs, files in walk_egg(self.bdist_dir):
            for name in files:
                if not name.endswith('__init__.py'):
                    if name.endswith('.py') or name.endswith('.pyc'):
                        # original 'if' only has name.endswith('.py')
                        path = os.path.join(base, name)
                        log.info("Deleting %s",path)
                        os.unlink(path)

ext_modules=[
    Extension("mypack.mymod", ["mypack/mymod.py"]),
]

setup(
  name = 'mypack',
  cmdclass = {'build_ext': build_ext, 
              'bdist_egg': my_bdist_egg },
  ext_modules = ext_modules,
  version='0.0.1',
  description='This is mypack compiled lib',
  author='Myself',
  packages=['mypack'],
)

UPDATE. Following @Teyras answer, it was possible to build a wheel as requested in the answer. The setup.py file contents are:

import os
import shutil
from setuptools.extension import Extension
from setuptools import setup
from Cython.Build import cythonize
from Cython.Distutils import build_ext

class MyBuildExt(build_ext):
    def run(self):
        build_ext.run(self)
        build_dir = os.path.realpath(self.build_lib)
        root_dir = os.path.dirname(os.path.realpath(__file__))
        target_dir = build_dir if not self.inplace else root_dir
        self.copy_file('mypack/__init__.py', root_dir, target_dir)

    def copy_file(self, path, source_dir, destination_dir):
        if os.path.exists(os.path.join(source_dir, path)):
            shutil.copyfile(os.path.join(source_dir, path), 
                            os.path.join(destination_dir, path))


setup(
  name = 'mypack',
  cmdclass = {'build_ext': MyBuildExt},
  ext_modules = cythonize([Extension("mypack.*", ["mypack/*.py"])]),
  version='0.0.1',
  description='This is mypack compiled lib',
  author='Myself',
  packages=[],
  include_package_data=True )

The key point was to set packages=[],. The overwriting of the build_ext class run method was needed to get the __init__.py file inside the wheel.

Easton answered 14/9, 2016 at 20:51 Comment(0)
P
11

Unfortunately, the answer suggesting setting packages=[] is wrong and may break a lot of stuff, as can e.g. be seen in this question. Don't use it. Instead of excluding all packages from the dist, you should exclude only the python files that will be cythonized and compiled to shared objects.

Below is a working example; it uses my recipe from the question Exclude single source file from python bdist_egg or bdist_wheel. The example project contains package spam with two modules, spam.eggs and spam.bacon, and a subpackage spam.fizz with one module spam.fizz.buzz:

root
├── setup.py
└── spam
    ├── __init__.py
    ├── bacon.py
    ├── eggs.py
    └── fizz
        ├── __init__.py
        └── buzz.py

The module lookup is being done in the build_py command, so it is the one you need to subclass with custom behaviour.

Simple case: compile all source code, make no exceptions

If you are about to compile every .py file (including __init__.pys), it is already sufficient to override build_py.build_packages method, making it a noop. Because build_packages doesn't do anything, no .py file will be collected at all and the dist will include only cythonized extensions:

import fnmatch
from setuptools import find_packages, setup, Extension
from setuptools.command.build_py import build_py as build_py_orig
from Cython.Build import cythonize


extensions = [
    # example of extensions with regex
    Extension('spam.*', ['spam/*.py']),
    # example of extension with single source file
    Extension('spam.fizz.buzz', ['spam/fizz/buzz.py']),
]


class build_py(build_py_orig):
    def build_packages(self):
        pass


setup(
    name='...',
    version='...',
    packages=find_packages(),
    ext_modules=cythonize(extensions),
    cmdclass={'build_py': build_py},
)

Complex case: mix cythonized extensions with source modules

If you want to compile only selected modules and leave the rest untouched, you will need a bit more complex logic; in this case, you need to override module lookup. In the below example, I still compile spam.bacon, spam.eggs and spam.fizz.buzz to shared objects, but leave __init__.py files untouched, so they will be included as source modules:

import fnmatch
from setuptools import find_packages, setup, Extension
from setuptools.command.build_py import build_py as build_py_orig
from Cython.Build import cythonize


extensions = [
    Extension('spam.*', ['spam/*.py']),
    Extension('spam.fizz.buzz', ['spam/fizz/buzz.py']),
]
cython_excludes = ['**/__init__.py']


def not_cythonized(tup):
    (package, module, filepath) = tup
    return any(
        fnmatch.fnmatchcase(filepath, pat=pattern) for pattern in cython_excludes
    ) or not any(
        fnmatch.fnmatchcase(filepath, pat=pattern)
        for ext in extensions
        for pattern in ext.sources
    )


class build_py(build_py_orig):
    def find_modules(self):
        modules = super().find_modules()
        return list(filter(not_cythonized, modules))

    def find_package_modules(self, package, package_dir):
        modules = super().find_package_modules(package, package_dir)
        return list(filter(not_cythonized, modules))


setup(
    name='...',
    version='...',
    packages=find_packages(),
    ext_modules=cythonize(extensions, exclude=cython_excludes),
    cmdclass={'build_py': build_py},
)
Power answered 8/5, 2019 at 15:14 Comment(10)
I marked this as the correct answer because it seems you know very well what you are saying. I'll try the code, though, and let you know whether or not it worked for me. Thank you for your contribution!Easton
Glad I could help! Ping me if should update the answer with an example code for your particular use case.Power
I cannot get this to work on Python 3.7.0. For one, build_py seems to return a filter object, so the filter() has to be wrapped in a list() call. Second, even though the filter result is correct for me, it has no effect on the package at all. The py files are still being included.Magniloquent
@Power I can confirm that the updated version works with 3.7.Magniloquent
@hoefling: i have tried your approach but it seems that the original c files are included in the wheel. is that expected?Digitalize
@Digitalize it's a matter of Cython configuration. The C sources are written in-place by default, you can specify a custom target directory though. cythonize(..., build_dir='/tmp') etc.Power
great, will try and report back.Digitalize
@hoefling: Thanks for this detailed answer. Really helps in getting majority of the stuff done. Could you also guide on how the following could be done as well? 1. Add some folders without .py code, eg: config folder (with .yml files), log folder (with .log files) etc. I tried adding to Extension (but realised only .py files are supported there), then I added to cython_excludes and that also didnt make sense since it excludes from the extension packages. 2. Example on how to use the built wheel file. I tried importing into plain python terminal -leads to Import Error: Module not found issueBrachypterous
@Brachypterous 1. non-pythonic files are included via the package_data keyword in setup(). Though the files should be located under a package that is included in distribution (=contained in the packages list). 2. The wheel must be installed first: pip install path/to/mypkg.whlPower
@hoefling: I have tried both. I have added my question in detail here: #67502534 Could you kindly refer and let me know your thoughts.Brachypterous
P
9

While packaging as a wheel is definitely what you want, the original question was about excluding .py source files from the package. This is addressed in Using Cython to protect a Python codebase by @Teyras, but his solution uses a hack: it removes the packages argument from the call to setup(). This prevents the build_py step from running which does, indeed, exclude the .py files but it also excludes any data files you want included in the package. (For example my package has a data file called VERSION which contains the package version number.) A better solution would be replacing the build_py setup command with a custom command which only copies the data files.

You also need the __init__.py file as described above. So the custom build_py command should create the __init_.py file. I found that the compiled __init__.so runs when the package is imported so all that is needed is an empty __init__.py file to tell Python that the directory is a module which is ok to import.

Your custom build_py class would look like:

import os
from setuptools.command.build_py import build_py

class CustomBuildPyCommand(build_py):
    def run(self):
        # package data files but not .py files
        build_py.build_package_data(self)
        # create empty __init__.py in target dirs
        for pdir in self.packages:
            open(os.path.join(self.build_lib, pdir, '__init__.py'), 'a').close()

And configure setup to override the original build_py command:

setup(
   ...
   cmdclass={'build_py': CustomBuildPyCommand},
)
Parahydrogen answered 8/3, 2018 at 18:4 Comment(2)
Great observation. I marked @Teyras answer though, because his answer meets the question requirements, and he answered first.Easton
You sir, saved my day. I had no idea how to copy only the data files for a cythonized package. A small improvement, I copied the init.py files from a custom build_ext class instead of creating new ones, in case there is logic in them.Nato
D
3

I suggest you use the wheel format (as suggested by fish2000). Then, in your setup.py, set the packages argument to []. Your Cython extension will still build and the resulting .so files will be included in the resulting wheel package.

If your __init__.py is not included in the wheel, you can override the run method of build_ext class shipped by Cython and copy the file from your source tree to the build folder (the path can be found in self.build_lib).

Desdee answered 30/7, 2017 at 20:50 Comment(1)
It works!! I will update the answer with the setup.py for self-contention.Easton
L
1

This was exactly the sort of problem the Python wheels formatdescribed in PEP 427 – was developed to address.

Wheels are a replacement for Python eggs (which were/are problematic for a bunch of reasons) – they are supported by pip, can contain architecture-specific private binaries (here is one example of such an arrangement) and are accepted generally by the Python communities who have stakes in these kind of things.

Here is one setup.py snippet from the aforelinked Python on Wheels article, showing how one sets up a binary distribution:

import os
from setuptools import setup
from setuptools.dist import Distribution

class BinaryDistribution(Distribution):
    def is_pure(self):
        return False

setup(
    ...,
    include_package_data=True,
    distclass=BinaryDistribution,
)

… in leu of the older (but probably somehow still canonically supported) setuptools classes you are using. It’s very straightforward to make Wheels for your distribution purposes, as outlined – as I recall from experience, either the wheel modules’ build process is somewhat cognizant of virtualenv, or it’s very easy to use one within the other.

In any case, trading in the setuptools egg-based APIs for wheel-based tooling should save you some serious pain, I should think.

Leonie answered 22/9, 2016 at 13:41 Comment(4)
Thank you so much for your answer. I still can't make it work. Followed your advice but the .py and .pyc files are still included in the wheel. And can't find either how to remove files subclassing (or whatever is needed) the Distribution class. I read all the links and noting. About the example you quote, I don't know how to translate that yaml configuration file to a python setup.py script. ¿Could you provide more specifics? Thanks!Easton
@Easton you should have a look at Delocategithub.com/matthew-brett/delocate – the source of which has a great many useful functions for manipulating wheels and shared object files. That will likely get you going… as for the yaml thing, I would predict that that’s going to be the hard/irritating part of the project, where you’ll just have to dive in and write your own configuration-file reader code, hewn closely to the intrinsic form taken by the data you work with. Indeed!Leonie
I searched in the indicated links and I still can find a solution. In those links, a lot of tunning is done over the Distribution, run_setup, setup, etc classes, functions and modules. I tried to navigated the source files of many of the libraries with the bdist_wheel command and the source is still distributed.Easton
I really appreciate your help, but I'm afraid you only provided pointers to a solution, but we are still lacking a complete self-contained solution to the small simple case I posted. I was not able to use your pointers to get a solution, and that is why I didn't mark your answer as correct.Easton

© 2022 - 2024 — McMap. All rights reserved.