How to easily distribute Python software that has Python module dependencies? Frustrations in Python package installation on Unix
Asked Answered
S

2

59

My goal is to distribute a Python package that has several other widely used Python packages as dependencies. My package depends on well written, Pypi-indexed packages like pandas, scipy and numpy, and specifies in the setup.py that certain versions or higher of these are needed, e.g. "numpy >= 1.5".

I found that it's immensely frustrating and nearly impossible for Unix savvy users who are not experts in Python packaging (even if they know how to write Python) to install a package like mine, even when using what are supposed to be easy to use package managers. I am wondering if there is an alternative to this painful process that someone can offer, or if my experience just reflects the very difficult current state of Python packaging and distribution.

Suppose users download your package onto their system. Most will try to install it "naively", using something like:

$ python setup.py install

Since if you google instructions on installing Python packages, this is usually what comes up. This will fail for the vast majority of users, since most do not have root access on their Unix/Linux servers. With more searching, they will discover the "--prefix" option and try:

$ python setup.py install --prefix=/some/local/dir

Since the users are not aware of the intricacies of Python packaging, they will pick an arbitrary directory as an argument to --prefix, e.g. "~/software/mypackage/". It will not be a cleanly curated directory where all other Python packages reside, because again, most users are not aware of these details. If they install another package "myotherpackage", they might pass it "~/software/myotherpackage", and you can imagine how down the road this will lead to frustrating hacking of PYTHONPATH and other complications.

Continuing with the installation process, the call to "setup.py install" with "--prefix" will also fail once users try to use the package, even though it appeared to have been installed correctly, since one of the dependencies might be missing (e.g. pandas, scipy or numpy) and a package manager is not used. They will try to install these packages individually. Even if successful, the packages will inevitably not be in the PYTHONPATH due to the non-standard directories given to "--prefix" and patient users will dabble with modifications of their PYTHONPATH to get the dependencies to be visible.

At this stage, users might be told by a Python savvy friend that they should use a package manager like "easy_install", the mainstream manager, to install the software and have dependencies taken care of. After installing "easy_install", which might be difficult, they will try:

$ easy_install setup.py 

This too will fail, since users again do not typically have permission to install software globally on production Unix servers. With more reading, they will learn about the "--user" option, and try:

$ easy_install setup.py --user 

They will get the error:

usage: easy_install [options] requirement_or_url ...
   or: easy_install --help

error: option --user not recognized

They will be extremely puzzled why their easy_install does not have the --user option where there are clearly pages online describing the option. They might try to upgrade their easy_install to the latest version and find that it still fails.

If they continue and consult a Python packaging expert, they will discover that there are two versions of easy_install, both named "easy_install" so as to maximize confusion, but one part of "distribute" and the other part of "setuptools". It happens to be that only the "easy_install" of "distribute" supports "--user" and the vast majority of servers/sys admins install "setuptools"'s easy_install and so local installation will not be possible. Keep in mind that these distinctions between "distribute" and "setuptools" are meaningless and hard to understand for people who are not experts in Python package management.

At this point, I would have lost 90% of even the most determined, savvy and patient users who try to install my software package -- and rightfully so! They wanted to install a piece of software that happened to be written in Python, not to become experts in state of the art Python package distribution, and this is far too confusing and complex. They will give up and be frustrated at the time wasted.

The tiny minority of users who continue on and ask more Python experts will be told that they ought to use pip/virtualenv instead of easy_install. Installing pip and virtualenv and figuring out how these tools work and how they are different from the conventional "python setup.py" or "easy_install" calls is in itself time consuming and difficult, and again too much to ask from users who just wanted to install a simple piece of Python software and use it. Even those who pursue this path will be confused as to whether whatever dependencies they installed with easy_install or setup.py install --prefix are still usable with pip/virtualenv or if everything needs to be reinstalled from scratch.

This problem is exacerbated if one or more of the packages in question depends on installing a different version of Python than the one that is the default. The difficulty of ensuring that your Python package manger is using the Python version you want it to, and that the required dependencies are installed in the relevant Python 2.x directory and not Python 2.y, will be so endlessly frustrating to users that they will certainly give up at that stage.

Is there a simpler way to install Python software that doesn't require users to delve into all of these technical details of Python packages, paths and locations? For example, I am not a big Java user, but I do use some Java tools occasionally, and don't recall ever having to worry about X and Y dependencies of the Java software I was installing, and I have no clue how Java package managing works (and I'm happy that I don't -- I just wanted to use a tool that happened to be written in Java.) My recollection is that if you download a Jar, you just get it and it tends to work.

Is there an equivalent for Python? A way to distribute software in a way that doesn't depend on users having to chase down all these dependencies and versions? A way to perhaps compile all the relevant packages into something self-contained that can just be downloaded and used as a binary?

I would like to emphasize that this frustration happens even with the narrow goal of distributing a package to savvy Unix users, which makes the problem simpler by not worrying about cross platform issues, etc. I assume that the users are Unix savvy, and might even know Python, but just aren't aware (and don't want to be made aware) about the ins and outs of Python packaging and the myriad of internal complications/rivalries of different package managers. A disturbing feature of this issue is that it happens even when all of your Python package dependencies are well-known, well-written and well-maintained Pypi-available packages like Pandas, Scipy and Numpy. It's not like I was relying on some obscure dependencies that are not properly formed packages: rather, I was using the most mainstream packages that many might rely on.

Any help or advice on this will be greatly appreciated. I think Python is a great language with great libraries, but I find it virtually impossible to distribute the software I write in it (once it has dependencies) in a way that is easy for people to install locally and just run. I would like to clarify that the software I'm writing is not a Python library for programmatic use, but software that has executable scripts that users run as individual programs. Thanks.

Syllabogram answered 2/2, 2013 at 18:45 Comment(10)
In my opinion, the best answer is to distribute it in the standard way - that is, pip, as easy_install is deprecated, (this will mean it requires some knowledge, as you say, to install), then let each distro's maintainers deal with making it easy to install for that distro - what that will mean will vary, although generally it will mean a one-click install through the package manager, which will deal with deps.Hydrothermal
What do you mean by distro maintainers? I don't understand the comment. I also don't know what to tell my users in the Installation section of the manual. Should they all overhaul their entire setup and use virtualenv/pip before even starting? Install distribute easy_install? I don't even know what to tell them.Syllabogram
Virtually every Linux distribution (and OS X in the form of homebrew and others) has a package manager. These are pieces of software that handle software installation and maintenance of installed software, and package maintainers maintain packages that tell the system how to install software and keep it up to date. Generally, software for Linux OSes will be released as source, and then a package maintainer will create a package to deal with all the installation for the end user, fitting the distro's style and setup. Package maintainers will be able to deal with Python's distribution methods.Hydrothermal
This means that for the end user, the installation process will be sudo apt-get install some_python_package or sudo pacman -S some_python_package, however their package manager works. This is sensible as different distros will require different things - Arch, for example, uses Python 3 by default, a stance that may break certain software if it isn't carefully packaged to use Python 2. Also, this is being worked on Python-side. Take a look at PEP 427.Hydrothermal
@Lattyware: This is not really the case on production Linux servers that are shared by groups. If you use a server/cluster with many other people, your sys admins will have a way of installing software that is specific to them. For example, many servers use the module add or use system to add/remove software from one's path. I've never worked on a cluster where users can directly access package managers like apt-get on their own. So I don't think distribution package managers solve this case.Syllabogram
In that case the server owner is providing a package manager, so it's just the same thing repeated. The server owner installs using a package manager, and the end-user installs using whatever system the server owner provides.Hydrothermal
@Lattyware: It's not exactly a package manager, it's just a list of available packages that you can "load" or "unload" into your environment. So they might have Python installed that way or setuptools easy_install and you can do "use python2.7" or "use easy_install" to get access to easy_install. But once you have easy_install available through this system, all the issues I described in my post come up again. The modules system doesn't make any difference.Syllabogram
The setuptools easy_install also has --user, but only for recent versions.Lueck
This is a very relevant question, upvoted.Lueck
2016 and there still isn't a single simple working way of distribute a python program/script/package like in Java with a JAR fileAbernon
K
14

We also develop software projects that depend on numpy, scipy and other PyPI packages. Hands down, the best tool currently available out there for managing remote installations is zc.buildout. It is very easy to use. You download a bootstrapping script from their website and distribute that with your package. You write a "local deployment" file, called normally buildout.cfg, that explains how to install the package locally. You ship both the bootstrap.py file and buildout.cfg with your package - we use the MANIFEST.in file in our python packages to force the embedding of these two files with the zip or tar balls distributed by PyPI. When the user unpackages it, it should execute two commands:

$ python bootstrap.py # this will download zc.buildout and setuptools
$ ./bin/buildout # this will build and **locally** install your package + deps

The package is compiled and all dependencies are installed locally, which means that the user installing your package doesn't even need root privileges, which is an added feature. The scripts are (normally) placed under ./bin, so the user can just execute them after that. zc.buildout uses setuptools for interaction with PyPI so everything you expect works out of the box.

You can extend zc.buildout quite easily if all that power is not enough - you create the so-called "recipes" that can help the user to create extra configuration files, download other stuff from the net or instantiate custom programs. zc.buildout website contains a video tutorial that explains in details how to use buildout and how to extend it. Our project Bob makes extensive use of buildout for distributing packages for scientific usage. If you would like, please visit the following page that contains detailed instructions for our developers on how they can setup their python packages so other people can build and install them locally using zc.buildout.

Kalfas answered 2/2, 2013 at 20:24 Comment(6)
Great answer, I'll definitely look into buildout as well. Although our Python tool doesn't depend on other Python packages yet (exactly to avoid installation problems), it will in the near future, so we'll need a decent solution to allow users to easily install our tool. zc.buildout looks like a great solution, but it does still require the user to 1) download the package, 2) unpack it, 3) run 'python bootstrap.py' and 4) run buildout. Could you somehow make it work with one command that does it all, e.g., write a nastly setup.py script to make 'easy_install foo' work with buildout)?Lueck
I don't think it is easily possible nor that you would like that as something you'd have to maintain. Buildout depends on setuptools and not the contrary. In my experience, the best is always to rely on PyPI for package distribution: it is there, costs nothing and it is mirrored. We put a link to the PyPI package page on your publications. The user hits it and sees a nice manual and a green download button. You can write a script that does all this in a single shot, but then you will need to distribute/maintain that separately.Nelly
In our case, we already have a meta package that combines the three packages that form our software, see pypi.python.org/pypi/easybuild. The all-in-one-go script could be a part of the meta package (which now only has a setup.py basically).Lueck
The user still needs to download easybuild, unpack it and then run something. Why can't this something be buildout?Nelly
It could, but I think a solution that requires only running a single command would work as well.Lueck
'easy_install --user easybuild' is supposed to be that command, but this breaks too often as mentioned by OP I'm looking into buildout, and also thinking about an even better alternative. More news soon hopefully...Lueck
A
6

We're currently working to make it easier for users to get started installing Python software in a platform independent manner (in particular see https://python-packaging-user-guide.readthedocs.org/en/latest/future.html and http://www.python.org/dev/peps/pep-0453/)

For right now, the problem with two competing versions of easy_install has been resolved, with the competing fork "distribute" being merged backing into the setuptools main line of development.

The best currently available advice on cross-platform distribution and installation of Python software is captured here: https://packaging.python.org/

Abbotsun answered 15/9, 2013 at 12:11 Comment(2)
Thank you for this effort. Considering one of the Python principles is "There should be one-- and preferably only one --obvious way to do it.", I was astonished (in a bad way) by all the different, but similar and interconnected, options to package my code for easy user installation.Glassworker
'The “Python Packaging User Guide” (PyPUG) aims to be the authoritative resource on how to package, publish and install Python distributions using current tools.' - packaging.python.orgMingrelian

© 2022 - 2024 — McMap. All rights reserved.