Installing my sdist from PyPI puts the files in unexpected places
Asked Answered
M

2

8

My problem is that when I upload my Python package to PyPI, and then install it from there using pip, my app breaks because it installs my files into completely different locations than when I simply install the exact same package from a local sdist.

Installing from the local sdist puts files on my system like this:

/Python27/
  Lib/
    site-packages/
      gloopy-0.1.alpha-py2.7.egg/ (egg and install info files)
        data/ (images and shader source)
        doc/ (html)
        examples/ (.py scripts that use the library)
        gloopy/ (source)

This is much as I'd expect, and works fine (e.g. my source can find my data dir, because they lie next to each other, just like they do in development.)

If I upload the same sdist to PyPI and then install it from there, using pip, then things look very different:

/Python27/
  data/ (images and shader source)
  doc/ (html)
  Lib/
    site-packages/
      gloopy-0.1.alpha-py2.7.egg/ (egg and install info files)
      gloopy/ (source files)
  examples/ (.py scripts that use the library)

This doesn't work at all - my app can't find its data files, plus obviously it's a mess, polluting the top-level /python27 directory with all my junk.

What am I doing wrong? How do I make the pip install behave like the local sdist install? Is that even what I should be trying to achieve?

Details

I have setuptools installed, and also distribute, and I'm calling distribute_setup.use_setuptools()

WindowsXP, Python2.7.

My development directory looks like this:

/gloopy
  /data (image files and GLSL shader souce read at runtime)
  /doc (html files)
  /examples (some scripts to show off the library)
  /gloopy (the library itself)

My MANIFEST.in mentions all the files I want to be included in the sdist, including everything in the data, examples and doc directories:

recursive-include data *.*
recursive-include examples *.py
recursive-include doc/html *.html *.css *.js *.png
include LICENSE.txt
include TODO.txt

My setup.py is quite verbose, but I guess the best thing is to include it here, right? I also includes duplicate references to the same data / doc / examples directories as are mentioned in the MANIFEST.in, because I understand this is required in order for these files to be copied from the sdist to the system during install.

NAME = 'gloopy'
VERSION= __import__(NAME).VERSION
RELEASE = __import__(NAME).RELEASE
SCRIPT = None
CONSOLE = False

def main():
    import sys
    from pprint import pprint

    from setup_utils import distribute_setup
    from setup_utils.sdist_setup import get_sdist_config
    distribute_setup.use_setuptools()
    from setuptools import setup

    description, long_description = read_description()
    config = dict(
        name=name,
        version=version,
        description=description,
        long_description=long_description,
        keywords='',
        packages=find_packages(),
        data_files=[
            ('examples', glob('examples/*.py')),
            ('data/shaders', glob('data/shaders/*.*')),
            ('doc', glob('doc/html/*.*')),
            ('doc/_images', glob('doc/html/_images/*.*')),
            ('doc/_modules', glob('doc/html/_modules/*.*')),
            ('doc/_modules/gloopy', glob('doc/html/_modules/gloopy/*.*')),
            ('doc/_modules/gloopy/geom', glob('doc/html/_modules/gloopy/geom/*.*')),
            ('doc/_modules/gloopy/move', glob('doc/html/_modules/gloopy/move/*.*')),
            ('doc/_modules/gloopy/shapes', glob('doc/html/_modules/gloopy/shapes/*.*')),
            ('doc/_modules/gloopy/util', glob('doc/html/_modules/gloopy/util/*.*')),
            ('doc/_modules/gloopy/view', glob('doc/html/_modules/gloopy/view/*.*')),
            ('doc/_static', glob('doc/html/_static/*.*')),
            ('doc/_api', glob('doc/html/_api/*.*')),
        ],
        classifiers=[
            'Development Status :: 1 - Planning',
            'Intended Audience :: Developers',
            'License :: OSI Approved :: BSD License',
            'Operating System :: Microsoft :: Windows',
            'Programming Language :: Python :: 2.7',
        ],    
        # see classifiers http://pypi.python.org/pypi?:action=list_classifiers
    ) 

    config.update(dict(
        author='Jonathan Hartley',
        author_email='[email protected]',
        url='http://bitbucket.org/tartley/gloopy',
        license='New BSD',
    ) )

    if '--verbose' in sys.argv:
        pprint(config)

    setup(**config)


if __name__ == '__main__':
    main()
Merc answered 4/3, 2011 at 10:15 Comment(0)
C
7

The data_files parameter is for data files who isn't a part of the package. You should probably use package_data instead.

See https://docs.python.org/3/distutils/setupscript.html#installing-package-data

That wouldn't install the data in site-packages/data, but in my opinion that's not where is should be installed anyway. You won't know which package it's a part of. It should be installed in site-packages//gloopy-0.1.alpha-py2.7.egg/[data|doc|examples] IMO.

If you really do think the data is not package data, then you should use data_files and in that case pip installs it correctly, while I'd claim setup.py install installs it in the wrong place. But in my opinion, in this case, it is package_data, as it's related to the package, and not used by other software.

Clearing answered 4/3, 2011 at 12:9 Comment(4)
I absolutely agree with you that data|doc|examples should be installed inside site-packages/gloopy-0.1.alpha-py2.7.egg. Your implication that 'data_files' should only be used by data that is used by other software is illuminating for me. Thank you.Merc
Oops, your answer alerted me that I'd drawn the directory layout for a local sdist install wrong. 'data' etc did not go into site-packages, but into site-packages/gloopy-0.1.alpha-py2.7.egg, just as you suggested it should. Fixed now.Merc
@Tartley: Ah. Good. I do wonder why there is a difference there. We really need to get the install story straight in Python. (Hopefully distutils2 will do that).Clearing
You main link is dead.Bitter
C
-1

You can load package data with pkgutil.get_data(), it will find where exactly package data is installed.

Here is a nice blog post about including data files in packages: Including data files into Python packages

Crossed answered 4/3, 2011 at 11:40 Comment(1)
Hey there. Brilliant - thanks for the answer. Reviewing the links, I think the important nugget which you didn't mention explicitly is that I should always be using 'package_data' in my setup.py, rather than 'data_files'. Is the consensus therefore that 'data_files' should never be used? This puzzles me, because it contradicts my understanding of how to lay out directories in a Python project, in that it implies your 'data' dir goes inside your package source dir, not alongside it.Merc

© 2022 - 2024 — McMap. All rights reserved.