Including non-Python files with setup.py
Asked Answered
C

15

288

How do I make setup.py include a file that isn't part of the code? (Specifically, it's a license file, but it could be any other thing.)

I want to be able to control the location of the file. In the original source folder, the file is in the root of the package. (i.e. on the same level as the topmost __init__.py.) I want it to stay exactly there when the package is installed, regardless of operating system. How do I do that?

Copperhead answered 23/10, 2009 at 11:4 Comment(4)
how do you do that at the moment? your previous question indicates that you're familiar with how to add the license file, so what is your code that "doesn't work"?Thordia
data_files = [('', ['lgpl2.1_license.txt',]),] puts it in the Python26 folder.Copperhead
After some negative feedback, I read your question again and realized what I was missing. I have updated my answer to provide a non-hackish solution to your question that doesn't require any additional modules (such as setuptools or distribute).Forethought
Thanks Evan. However, I am perfectly okay with using setuptools, since it is so prevalent.Copperhead
N
315

Probably the best way to do this is to use the setuptools package_data directive. This does mean using setuptools (or distribute) instead of distutils, but this is a very seamless "upgrade".

Here's a full (but untested) example:

from setuptools import setup, find_packages

setup(
    name='your_project_name',
    version='0.1',
    description='A description.',
    packages=find_packages(exclude=['ez_setup', 'tests', 'tests.*']),
    package_data={'': ['license.txt']},
    include_package_data=True,
    install_requires=[],
)

Note the specific lines that are critical here:

package_data={'': ['license.txt']},
include_package_data=True,

package_data is a dict of package names (empty = all packages) to a list of patterns (can include globs). For example, if you want to only specify files within your package, you can do that too:

package_data={'yourpackage': ['*.txt', 'path/to/resources/*.txt']}

The solution here is definitely not to rename your non-py files with a .py extension.

See Ian Bicking's presentation for more info.

UPDATE: Another [Better] Approach

Another approach that works well if you just want to control the contents of the source distribution (sdist) and have files outside of the package (e.g. top-level directory) is to add a MANIFEST.in file. See the Python documentation for the format of this file.

Since writing this response, I have found that using MANIFEST.in is typically a less frustrating approach to just make sure your source distribution (tar.gz) has the files you need.

For example, if you wanted to include the requirements.txt from top-level, recursively include the top-level "data" directory:

include requirements.txt
recursive-include data *

Nevertheless, in order for these files to be copied at install time to the package’s folder inside site-packages, you’ll need to supply include_package_data=True to the setup() function. See Adding Non-Code Files for more information.

Nordic answered 7/12, 2009 at 2:20 Comment(16)
The link to Ian's presentation is broken :(Choe
package_data is also available to pure distutils setup scripts since Python 2.3.Radar
This answer looks sensible, but doesn't work for me. Since package_data is notoriously unreliable (requires co-ordination of MANIFEST.in and setup.py to both add files to the sdist and install them, as separate steps) and the author of this answer notes it "isn't tested", can anyone else confirm whether it works for them? My LICENSE file is included in the sdist, but not installed when I run "python setup.py install" nor "pip install Package"Bluebird
Ian Bicking's presentation only shows how to install package data for files that are within a package. My LICENSE file is at the top level of my project, i.e. not in any packages. Can I still use package_data? Using data_files is a non-starter, because it puts the files in a system-wide location. not associated with my project, and to make it worse, the location changes depending on whether I run "setup.py install" or "pip install", from the same sdist.Bluebird
I'm guessing that the reason it doesn't work for me is that the file isn't located within any package - it's a LICENSE file in the top level of the repository, and hence can't be installed using 'package_data'Bluebird
MANIFEST.in is totally the way to go. Seems sdist completely ignores package_data with some versions of setuptools.Unroll
For your first example, you'll want to remove include_package_data as that looks in SVN and Subversion and seems to stomp on package_data sometimes. See stackoverflow.com/questions/7522250/…Kristelkristen
Since package_data is notoriously unreliable(...) Maybe the reason for this is the bug described in python distutils does not include data_files?Rotman
This answer does not work for me. The additional files are not getting put into the tarball...Iaria
Would it be ok to edit the answer to include @Éric-araujo's comment: package_data is available on distutils since 2.4.Bukhara
Sure -- you want to go ahead and edit it as you see appropriate?Nordic
I had seen, read and tried the package_data and it wasn't working until I noticed the include_package_data=True here. Seems silly to require that setting - why else would anyone specify package_data if not for inclusion?Deanedeaner
Please note that your folders containing non-python files should still have __init__.py package marker. If that is missing no warning is generated and no fies are copiedMoonset
I also had the issue where package_data was not working. Doing both things (adding to MANIFEST.in and adding to package_data) fixed it for me.Pile
It is 2019, none of the most voted answers here worked for me - I found my way and added another answer. (Sorry for "spamming", but due to being new it has little visibility, and it can really help people getting here)Hexahydrate
Ian Bicking's presentation is dead.Congeal
F
45

To accomplish what you're describing will take two steps...

  • The file needs to be added to the source tarball
  • setup.py needs to be modified to install the data file to the source path

Step 1: To add the file to the source tarball, include it in the MANIFEST

Create a MANIFEST template in the folder that contains setup.py

The MANIFEST is basically a text file with a list of all the files that will be included in the source tarball.

Here's what the MANIFEST for my project look like:

  • CHANGELOG.txt
  • INSTALL.txt
  • LICENSE.txt
  • pypreprocessor.py
  • README.txt
  • setup.py
  • test.py
  • TODO.txt

Note: While sdist does add some files automatically, I prefer to explicitly specify them to be sure instead of predicting what it does and doesn't.

Step 2: To install the data file to the source folder, modify setup.py

Since you're looking to add a data file (LICENSE.txt) to the source install folder you need to modify the data install path to match the source install path. This is necessary because, by default, data files are installed to a different location than source files.

To modify the data install dir to match the source install dir...

Pull the install dir info from distutils with:

from distutils.command.install import INSTALL_SCHEMES

Modify the data install dir to match the source install dir:

for scheme in INSTALL_SCHEMES.values():
    scheme['data'] = scheme['purelib']

And, add the data file and location to setup():

data_files=[('', ['LICENSE.txt'])]

Note: The steps above should accomplish exactly what you described in a standard manner without requiring any extension libraries.

Forethought answered 15/6, 2010 at 4:0 Comment(8)
MANIFEST only control files included in the source tarball (produced by sdist). Files listed there won't be installed.Eggplant
@David I didn't realize how far off I was in my first approach. I have updated the answer to be correct to accomplish what the question was asking without requiring any additional third-party libraries.Forethought
I consider manually editing the install schemes a very bad idea.Radar
@Éric Any particular reason why? and, do you have a viable installer alternative that doesn't require 3rd party packages (like setup_tools) to work. I chose distutils over setuptools because it's included with a vanilla install of python and I was building modules for PYPI. There should be a better way to do this now using distutils2 but I haven't touched python in quite a while so I wouldn't know how. Since you seem to be knowledgeable about distutils2 I think it would benefit the rest of us to have a proper distutils2 alternative.Forethought
Because monkeying with internal objects that distutils and other tools depend on is inherently a bad idea. For a pure-distutils solution, use package_data.Radar
As has been mentioned in other threads package_data doesn't work if the file is not in the package.Ceaseless
@ÉricAraujo: It is not a bad idea to use this solution as there is no other way. It is a bad distutils design - that's true. But it is de-facto public API which will never change, because it will break many things. Let's hope that distutils2 will provide better recommended ways.Myrtamyrtaceous
Wheel produced in this way will put the data files in the "data" directory, and nobody really knows where it is. Essentially dumping the data files in a waste bucket. This is not a solution at all...Redan
H
32

It is 2019, and here is what is working - despite advice here and there, what I found on the internet halfway documented is using setuptools_scm, passed as options to setuptools.setup. This will include any data files that are versioned on your VCS, be it git or any other, to the wheel package, and will make "pip install" from the git repository to bring those files along.

So, I just added these two lines to the setup call on "setup.py". No extra installs or import required:

    setup_requires=['setuptools_scm'],
    include_package_data=True,

No need to manually list package_data, or in a MANIFEST.in file - if it is versioned, it is included in the package. The docs on "setuptools_scm" put emphasis on creating a version number from the commit position, and disregard the really important part of adding the data files. (I can't care less if my intermediate wheel file is named "*0.2.2.dev45+g3495a1f" or will use the hardcoded version number "0.3.0dev0" I've typed in - but leaving crucial files for the program to work behind is somewhat important)

Hexahydrate answered 14/9, 2019 at 3:32 Comment(3)
doesn't seem to keep .png files in the installSheared
Worked for me with a CSS file. Thanks for the tip.Consequent
works 2023 with python3.10Exocrine
K
23

create MANIFEST.in in the project root with recursive-include to the required directory or include with the file name.

include LICENSE
include README.rst
recursive-include package/static *
recursive-include package/templates *

documentation can be found here

Kwashiorkor answered 20/9, 2017 at 11:24 Comment(0)
D
20

Step 1: create a MANIFEST.in file in the same folder with setup.py

Step 2: include the relative path to the files you want to add in MANIFEST.in

include README.rst
include docs/*.txt
include funniest/data.json

Step 3: set include_package_data=True in the setup() function to copy these files to site-package

Reference is here.

Darciedarcy answered 26/10, 2018 at 21:43 Comment(0)
K
10

I wanted to post a comment to one of the questions but I don't enough reputation to do that >.>

Here's what worked for me (came up with it after referring the docs):

package_data={
    'mypkg': ['../*.txt']
},

include_package_data: False

The last line was, strangely enough, also crucial for me (you can also omit this keyword argument - it works the same).

What this does is it copies all text files in your top-level or root directory (one level up from the package mypkg you want to distribute).

Kennith answered 28/9, 2018 at 19:18 Comment(1)
I was looking for a way to not have to create a MANIFEST.in, this worked for me. The last line was also crucial for me. My lines were include_package_data=False, package_data={ "": ["../CHANGELOG.md"] },Enamor
H
9

None of the above really worked for me. What saved me was this answer.
Apparently, in order for these data files to be extracted during installation, I had to do a couple of things:

  1. Like already mentioned - Add a MANIFEST.in to the project and specify the folder/files you want to be included. In my case: recursive-include folder_with_extra_stuff *
  2. Again, like already mentioned - Add include_package_data=True to your setup.py. This is crucial, because without it only the files that match *.py will be brought.
  3. This is what was missing! - Add an empty __init__.py to your data folder. For me I had to add this file to my folder-with-extra-stuff.
  4. Extra - Not sure if this is a requirement, but with my own python modules I saw that they're zipped inside the .egg file in site-packages. So I had to add zip_safe=False to my setup.py file.

Final Directory Structure

my-app/
├─ app/
│  ├─ __init__.py
│  ├─ __main__.py
├─ folder-with-extra-stuff/
│  ├─ __init__.py
│  ├─ data_file.json
├─ setup.py
├─ MANIFEST.in
Hydrobomb answered 26/3, 2021 at 0:44 Comment(4)
Neat overview – for me the issue was how I was declaring files in MANIFEST.in – thanks!Nela
can you update this with directory structure? here is a good site for that ascii-tree-generator.comRowlock
@Rowlock sure, no problem.Hydrobomb
thank you very much, this solution worked for me. The extra init.py did the trickRowlock
T
8

This works in 2020!

As others said create "MANIFEST.in" where your setup.py is located.

Next in manifest include/exclude all the necessary things. Be careful here regarding the syntax. Ex: lets say we have template folder to be included in the source package.

in manifest file do this :

recursive-include template *

Make sure you leave space between dir-name and pattern for files/dirs like above. Dont do like this like we do in .gitignore

recursive-include template/* [this won't work]

Other option is to use include. There are bunch of options. Look up here at their docs for Manifest.in

And the final important step, include this param in your setup.py and you are good to go!

   setup(
    ...
    include_package_data=True,
    ......
)

Hope that helps! Happy Coding!

Taverner answered 14/7, 2020 at 17:59 Comment(0)
I
5

In setup.py under setup( :

setup(
   name = 'foo library'
   ...
  package_data={
   'foolibrary.folderA': ['*'],     # All files from folder A
   'foolibrary.folderB': ['*.txt']  #All text files from folder B
   },
Idolater answered 27/12, 2014 at 4:49 Comment(1)
This actually does nothing towards accomplishing the OP's goal. Whatever you write in package_data will have no influence on what setup.py install does, unless you modify the install command itself. Unless those files are under package directory, which is usually something you'd want to avoid.Redan
C
3

Here is a simpler answer that worked for me.

First, per a Python Dev's comment above, setuptools is not required:

package_data is also available to pure distutils setup scripts 
since 2.3. – Éric Araujo

That's great because putting a setuptools requirement on your package means you will have to install it also. In short:

from distutils.core import setup

setup(
    # ...snip...
    packages          = ['pkgname'],
    package_data      = {'pkgname': ['license.txt']},
)
Ceaseless answered 5/8, 2013 at 23:8 Comment(1)
It will complain the directory pkgame does not existKilljoy
R
2

I just wanted to follow up on something I found working with Python 2.7 on Centos 6. Adding the package_data or data_files as mentioned above did not work for me. I added a MANIFEST.IN with the files I wanted which put the non-python files into the tarball, but did not install them on the target machine via RPM.

In the end, I was able to get the files into my solution using the "options" in the setup/setuptools. The option files let you modify various sections of the spec file from setup.py. As follows.

from setuptools import setup


setup(
    name='theProjectName',
    version='1',
    packages=['thePackage'],
    url='',
    license='',
    author='me',
    author_email='[email protected]',
    description='',
    options={'bdist_rpm': {'install_script': 'filewithinstallcommands'}},
)

file - MANIFEST.in:

include license.txt

file - filewithinstallcommands:

mkdir -p $RPM_BUILD_ROOT/pathtoinstall/
#this line installs your python files
python setup.py install -O1 --root=$RPM_BUILD_ROOT --record=INSTALLED_FILES
#install license.txt into /pathtoinstall folder
install -m 700 license.txt $RPM_BUILD_ROOT/pathtoinstall/
echo /pathtoinstall/license.txt >> INSTALLED_FILES
Ray answered 1/6, 2015 at 17:22 Comment(0)
C
2

None of the answers worked for me because my files were at the top level, outside the package. I used a custom build command instead.

import os
import setuptools
from setuptools.command.build_py import build_py
from shutil import copyfile

HERE = os.path.abspath(os.path.dirname(__file__))
NAME = "thepackage"

class BuildCommand(build_py):
    def run(self):
        build_py.run(self)

        if not self.dry_run:
            target_dir = os.path.join(self.build_lib, NAME)
            for fn in ["VERSION", "LICENSE.txt"]:
                copyfile(os.path.join(HERE, fn), os.path.join(target_dir,fn))

 
 
setuptools.setup(
    name=NAME,
    cmdclass={"build_py": BuildCommand},
    description=DESCRIPTION,
    ...
)
Coachandfour answered 19/9, 2020 at 7:54 Comment(2)
This does not work for develop/editable installs, such as pip install -e or python setup.py developShore
@Shore I'm not a fan of -e. I always build during development because the build step can do many things, such as transform the file structure, resolve templates, generate icons, and so on. I prefer pip install <pkg> --target ~/testing/ because then you get what the user gets.Coachandfour
S
2

For non-python files to be included in an installation, they must be within one of the installed package directories. If you specify non-python files outside of your package directories in MANIFEST.in, they will be included in your distribution, but they will not be installed. The "documented" ways of installing arbitrary files outside of your package directories do not work reliably (as everyone has noticed by now).

The above answer from Julian Mann copies the files to your package directory in the build directory, so it does work, but not if you are installing in editable/develop mode (pip install -e or python setup.py develop). Based on this answer to a related question (and Julian's answer), below is an example that copies files to your installed package location either way after all the other install/develop tasks are done. The assumption here is that files file1 and file2 in your root-level data directory will be copied to your installed package directory (my_package), and that they will be accessible from python modules in your package using os.path.join(os.path.dirname(__file__), 'file1'), etc.

Remember to also do the MANIFEST.in stuff described above so that these files are also included in your distribution. Why setuptools would include files in your distribution and then silently never install them, is beyond my ken. Though installing them outside of your package directory is probably even more dubious.

import os
from setuptools import setup
from setuptools.command.develop import develop
from setuptools.command.install import install
from shutil import copyfile

HERE = os.path.abspath(os.path.dirname(__file__))
NAME = 'my_package'

def copy_files (target_path):
    source_path = os.path.join(HERE, 'data')
    for fn in ["file1", "file2"]:
        copyfile(os.path.join(source_path, fn), os.path.join(target_path,fn))

class PostDevelopCommand(develop):
    """Post-installation for development mode."""
    def run(self):
        develop.run(self)
        copy_files (os.path.abspath(NAME))

class PostInstallCommand(install):
    """Post-installation for installation mode."""
    def run(self):
        install.run(self)
        copy_files (os.path.abspath(os.path.join(self.install_lib, NAME)))

setup(
    name=NAME,
    cmdclass={
        'develop': PostDevelopCommand,
        'install': PostInstallCommand,
    },
    version='0.1.0',
    packages=[NAME],
    include_package_data=True,
    setup_requires=['setuptools_scm'],
)

Shore answered 9/1, 2022 at 3:52 Comment(0)
F
1

Note that distutils is deprecated, and is removed from Python 3.12.


The most common packaging tool is setuptools, which has good data files support, which can be specified using using include_package_data, package_data, or specifying a subdirectory for data files. These can be specified with one of setup.cfg, setup.py, or pyproject.toml files.

Fotina answered 23/7, 2023 at 21:18 Comment(0)
C
-15

Figured out a workaround: I renamed my lgpl2.1_license.txt to lgpl2.1_license.txt.py, and put some triple quotes around the text. Now I don't need to use the data_files option nor to specify any absolute paths. Making it a Python module is ugly, I know, but I consider it less ugly than specifying absolute paths.

Copperhead answered 23/10, 2009 at 12:22 Comment(1)
See my post. It doesn't have to be ugly. It's just hard to find a good example on the net because good documentation to setup packages is hard to find.Forethought

© 2022 - 2025 — McMap. All rights reserved.