How to add package data recursively in Python setup.py?
Asked Answered
T

12

63

I have a new library that has to include a lot of subfolders of small datafiles, and I'm trying to add them as package data. Imagine I have my library as so:

 library
    - foo.py
    - bar.py
 data
   subfolderA
      subfolderA1
      subfolderA2
   subfolderB
      subfolderB1 
      ...

I want to add all of the data in all of the subfolders through setup.py, but it seems like I manually have to go into every single subfolder (there are 100 or so) and add an __init__.py file. Furthermore, will setup.py find these files recursively, or do I need to manually add all of these in setup.py like:

package_data={
  'mypackage.data.folderA': ['*'],
  'mypackage.data.folderA.subfolderA1': ['*'],
  'mypackage.data.folderA.subfolderA2': ['*']
   },

I can do this with a script, but seems like a super pain. How can I achieve this in setup.py?

PS, the hierarchy of these folders is important because this is a database of material files and we want the file tree to be preserved when we present them in a GUI to the user, so it would be to our advantage to keep this file structure intact.

Travistravus answered 27/12, 2014 at 4:58 Comment(2)
You want to compile all of those files as .exe?Comp
No, just want them accessible in program. If they are installed this way, I can access them in a path called data_dir: pkg_dir = op.abspath(op.dirname(file)) data_dir = op.join(pkg_dir, 'data') Then I could do in my program open('data_dir/somedatafile', 'r') Where data_dir will refer to wherever they were installed.Travistravus
H
31
  1. Use Setuptools instead of distutils.
  2. Use data files instead of package data. These do not require __init__.py.
  3. Generate the lists of files and directories using standard Python code, instead of writing it literally:

    data_files = []
    directories = glob.glob('data/subfolder?/subfolder??/')
    for directory in directories:
        files = glob.glob(directory+'*')
        data_files.append((directory, files))
    # then pass data_files to setup()
    
Hulton answered 27/12, 2014 at 6:4 Comment(1)
This answer clarifies the difference between data files and package files: stackoverflow.com/questions/4519127/…Bulletin
C
59

The problem with the glob answer is that it only does so much. I.e. it's not fully recursive. The problem with the copy_tree answer is that the files that are copied will be left behind on an uninstall.

The proper solution is a recursive one which will let you set the package_data parameter in the setup call.

I've written this small method to do this:

import os

def package_files(directory):
    paths = []
    for (path, directories, filenames) in os.walk(directory):
        for filename in filenames:
            paths.append(os.path.join('..', path, filename))
    return paths

extra_files = package_files('path_to/extra_files_dir')

setup(
    ...
    packages = ['package_name'],
    package_data={'': extra_files},
    ....
)

You'll notice that when you do a pip uninstall package_name, that you'll see your additional files being listed (as tracked with the package).

Crescendo answered 18/4, 2016 at 11:54 Comment(4)
Instead of paths.append('../' + os.path.join(path, filename)) do paths.append(os.path.join('..', path, filename))Reinhardt
@MadPhysicist Thanks. Editted my answer. I've been using os.path.join wrong all this time. I didn't realize it took a variable number of parameters.Crescendo
Thanks, super useful. Worth noting that to get this working I had to include the path to the directory that the setup.py file was in by adding directory = str(pathlib.Path(__file__).parent.absolute()) + str(pathlib.Path(directory)) as the first line in the package_files(directory) method.Edp
Docs say to always use forward slash, not os.path.join; see setuptools.readthedocs.io/en/latest/…Disini
H
31
  1. Use Setuptools instead of distutils.
  2. Use data files instead of package data. These do not require __init__.py.
  3. Generate the lists of files and directories using standard Python code, instead of writing it literally:

    data_files = []
    directories = glob.glob('data/subfolder?/subfolder??/')
    for directory in directories:
        files = glob.glob(directory+'*')
        data_files.append((directory, files))
    # then pass data_files to setup()
    
Hulton answered 27/12, 2014 at 6:4 Comment(1)
This answer clarifies the difference between data files and package files: stackoverflow.com/questions/4519127/…Bulletin
A
16

To add all the subfolders using package_data in setup.py: add the number of * entries based on you subdirectory structure

package_data={
  'mypackage.data.folderA': ['*','*/*','*/*/*'],
}
Anglicanism answered 22/10, 2019 at 18:45 Comment(0)
H
10

Use glob to select all subfolders in your setup.py:

...
packages=['your_package'],
package_data={'your_package': ['data/**/*']},
...
Hyetography answered 30/3, 2019 at 14:15 Comment(4)
recursive globs are not supported by setuptools according to pypa/setuptools#1806.Bootle
Interestingly, package_data={'': ['**/*.yml']} works to pick up all .yml files recursively across my whole project folder. Not an ideal solution, of course. But worth noting.Saeger
Seems like **/* is supported now as of Python 3.11 :)Atmospherics
Seems like this is even better than my approach of doing package_data={"your_package.data": ["**/*"]} because it doesn't require an __init__.py at all!Atmospherics
R
8

Update

According to the change log setuptools now supports recursive globs, using **, in package_data (as of v62.3.0, released May 2022).

Original answer

@gbonetti's answer, using a recursive glob pattern, i.e. **, would be perfect.

However, as commented by @daniel-himmelstein, that does not work yet in setuptools package_data.

So, for the time being, I like to use the following workaround, based on pathlib's Path.glob():

def glob_fix(package_name, glob):
    # this assumes setup.py lives in the folder that contains the package
    package_path = Path(f'./{package_name}').resolve()
    return [str(path.relative_to(package_path)) 
            for path in package_path.glob(glob)]

This returns a list of path strings relative to the package path, as required.

Here's one way to use this:

setuptools.setup(
    ...
    package_data={'my_package': [*glob_fix('my_package', 'my_data_dir/**/*'), 
                                 'my_other_dir/some.file', ...], ...},
    ...
)

The glob_fix() can be removed as soon as setuptools supports ** in package_data.

Ringe answered 11/11, 2020 at 15:38 Comment(1)
Seems like **/* is supported now as of Python 3.11 :)Atmospherics
S
4

If you don't have any problem with getting your setup.py code dirty use distutils.dir_util.copy_tree.
The whole problem is how to exclude files from it.
Heres some the code:

import os.path
from distutils import dir_util
from distutils import sysconfig
from distutils.core import setup

__packagename__ = 'x' 
setup(
    name = __packagename__,
    packages = [__packagename__],
)

destination_path = sysconfig.get_python_lib()
package_path = os.path.join(destination_path, __packagename__)

dir_util.copy_tree(__packagename__, package_path, update=1, preserve_mode=0)

Some Notes:

  • This code recursively copy the source code into the destination path.
  • You can just use the same setup(...) but use copy_tree() to extend the directory you want into the path of installation.
  • The default paths of distutil installation can be found in it's API.
  • More information about copy_tree() module of distutils can be found here.
  • Somnambulate answered 9/2, 2016 at 11:11 Comment(0)
    P
    2

    I can suggest a little code to add data_files in setup():

    data_files = []
    
    start_point = os.path.join(__pkgname__, 'static')
    for root, dirs, files in os.walk(start_point):
        root_files = [os.path.join(root, i) for i in files]
        data_files.append((root, root_files))
    
    start_point = os.path.join(__pkgname__, 'templates')
    for root, dirs, files in os.walk(start_point):
        root_files = [os.path.join(root, i) for i in files]
        data_files.append((root, root_files))
    
    setup(
        name = __pkgname__,
        description = __description__,
        version = __version__,
        long_description = README,
        ...
        data_files = data_files,
    )
    
    Pollard answered 19/7, 2017 at 15:40 Comment(0)
    S
    0

    I can do this with a script, but seems like a super pain. How can I achieve this in setup.py?

    Here is a reusable, simple way:

    Add the following function in your setup.py, and call it as per the Usage instructions. This is essentially the generic version of the accepted answer.

    def find_package_data(specs):
        """recursively find package data as per the folders given
    
        Usage:
            # in setup.py
            setup(...
                  include_package_data=True,
                  package_data=find_package_data({
                     'package': ('resources', 'static')
                  }))
    
        Args:
            specs (dict): package => list of folder names to include files from
    
        Returns:
            dict of list of file names
        """
        return {
            package: list(''.join(n.split('/', 1)[1:]) for n in
                          flatten(glob('{}/{}/**/*'.format(package, f), recursive=True) for f in folders))
            for package, folders in specs.items()}
    
    
    Shiest answered 22/6, 2020 at 9:46 Comment(0)
    I
    0

    I'm going to throw my solution in here in case anyone is looking for a clean way to include their compiled sphinx docs as data_files.

    setup.py

    from setuptools import setup
    import pathlib
    import os
    
    here = pathlib.Path(__file__).parent.resolve()
    
    # Get documentation files from the docs/build/html directory
    documentation = [doc.relative_to(here) for doc in here.glob("docs/build/html/**/*") if pathlib.Path.is_file(doc)]
    data_docs = {}
    for doc in documentation:
        doc_path = os.path.join("your_top_data_dir", "docs")
        path_parts = doc.parts[3:-1]  # remove "docs/build/html", ignore filename
        if path_parts:
            doc_path = os.path.join(doc_path, *path_parts)
        # create all appropriate subfolders and append relative doc path
        data_docs.setdefault(doc_path, []).append(str(doc))
    
    setup(
        ...
        include_package_data=True,
        # <sys.prefix>/your_top_data_dir
        data_files=[("your_top_data_dir", ["data/test-credentials.json"]), *list(data_docs.items())]
    )
    

    With the above solution, once you install your package you'll have all the compiled documentation available at os.path.join(sys.prefix, "your_top_data_dir", "docs"). So, if you wanted to serve the now-static docs using nginx you could add the following to your nginx file:

    location /docs {
        # handle static files directly, without forwarding to the application
        alias /www/your_app_name/venv/your_top_data_dir/docs;
        expires 30d;
    }
    

    Once you've done that, you should be able to visit {your-domain.com}/docs and see your Sphinx documentation.

    Individual answered 9/9, 2021 at 16:34 Comment(0)
    C
    0

    If you don't want to add custom code to iterate through the directory contents, you can use pbr library, which extends setuptools. See here for documentation on how to use it to copy an entire directory, preserving the directory structure:

    https://docs.openstack.org/pbr/latest/user/using.html#files

    Comeuppance answered 8/1, 2022 at 18:46 Comment(1)
    While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From ReviewConsumable
    O
    0

    You need to write a function to return all files and its paths , you can use the following

    def sherinfind():
        # Add all folders contain files or other sub directories 
        pathlist=['templates/','scripts/']
        data={}        
        for path in pathlist:
            for root,d_names,f_names in os.walk(path,topdown=True, onerror=None, followlinks=False):
                data[root]=list()
                for f in f_names:
                    data[root].append(os.path.join(root, f))                
        
        fn=[(k,v) for k,v in data.items()]    
        return fn
    

    Now change the data_files in setup() as follows,

    data_files=sherinfind()
    
    Oceania answered 18/1, 2022 at 13:39 Comment(0)
    Z
    -1

    find_packages discovers the packages recursively:

    setup(
        # [...]
        packages=find_packages(),
        # [...]
    )
    

    But this requires a __init__.py.

    Zolnay answered 28/7, 2023 at 11:57 Comment(0)

    © 2022 - 2025 — McMap. All rights reserved.