py2app picking up .git subdir of a package during build
Asked Answered
M

4

7

We use py2app extensively at our facility to produce self contained .app packages for easy internal deployment without dependency issues. Something I noticed recently, and have no idea how it began, is that when building an .app, py2app started including the .git directory of our main library.

commonLib, for instance, is our root python library package, which is a git repo. Under this package are the various subpackages such as database, utility, etc.

commonLib/
    |- .git/ # because commonLib is a git repo
    |- __init__.py
    |- database/
        |- __init__.py
    |- utility/
        |- __init__.py
    # ... etc

In a given project, say Foo, we will do imports like from commonLib import xyz to use our common packages. Building via py2app looks something like: python setup.py py2app

So the recent issue I am seeing is that when building an app for project Foo, I will see it include everything in commonLib/.git/ into the app, which is extra bloat. py2app has an excludes option but that only seems to be for python modules. I cant quite figure out what it would take to exclude the .git subdir, or in fact, what is causing it to be included in the first place.

Has anyone experienced this when using a python package import that is a git repo? Nothing has changed in our setup.py files for each project, and commonLib has always been a git repo. So the only thing I can think of being a variable is the version of py2app and its deps which have obviously been upgraded over time.

Edit

I'm using the latest py2app 0.6.4 as of right now. Also, my setup.py was first generated from py2applet a while back, but has been hand configured since and copied over as a template for every new project. I am using PyQt4/sip for every single one of these projects, so it also makes me wonder if its an issue with one of the recipes?

Update

From the first answer, I tried to fix this using various combinations of exclude_package_data settings. Nothing seems to force the .git directory to become excluded. Here is a sample of what my setup.py files generally look like:

from setuptools import setup
from myApp import VERSION

appname = 'MyApp'
APP = ['myApp.py']
DATA_FILES = []
OPTIONS = {
    'includes': 'atexit, sip, PyQt4.QtCore, PyQt4.QtGui',
    'strip': True, 
    'iconfile':'ui/myApp.icns', 
    'resources':['src/myApp.png'], 
    'plist':{
        'CFBundleIconFile':'ui/myApp.icns',
        'CFBundleIdentifier':'com.company.myApp',
        'CFBundleGetInfoString': appname,
        'CFBundleVersion' : VERSION,
        'CFBundleShortVersionString' : VERSION
        }
    }

setup(
    app=APP,
    data_files=DATA_FILES,
    options={'py2app': OPTIONS},
    setup_requires=['py2app'],
)

I have tried things like:

setup(
    ...
    exclude_package_data = { 'commonLib': ['.git'] },
    #exclude_package_data = { '': ['.git'] },
    #exclude_package_data = { 'commonLib/.git/': ['*'] },
    #exclude_package_data = { '.git': ['*'] },
    ...
)

Update #2

I have posted my own answer which does a monkeypatch on distutils. Its ugly and not preferred, but until someone can offer me a better solution, I guess this is what I have.

Messner answered 23/3, 2012 at 19:45 Comment(0)
M
3

I am adding an answer to my own question, to document the only thing I have found to work thus far. My approach was to monkeypatch distutils to ignore certain patterns when creating a directory or copying a file. This is really not what I wanted to do, but like I said, its the only thing that works so far.

## setup.py ##

import re

# file_util has to come first because dir_util uses it
from distutils import file_util, dir_util

def wrapper(fn):
    def wrapped(src, *args, **kwargs):
        if not re.search(r'/\.git/?', src):
            fn(src, *args, **kwargs) 
    return wrapped       

file_util.copy_file = wrapper(file_util.copy_file)
dir_util.mkpath = wrapper(dir_util.mkpath)

# now import setuptools so it uses the monkeypatched methods
from setuptools import setup

Hopefully someone will comment on this and tell me a higher level approach to avoid doing this. But as of now, I will probably wrap this into a utility method like exclude_data_patterns(re_pattern) to be reused in my projects.

Messner answered 30/3, 2012 at 19:37 Comment(5)
I think that's the only option here, short of having your script making a clean (.git-less) copy of the repo before packaging it up.Customs
@EtiennePerot: Thanks for the confirmation on that. I was hoping I wasnt just missing some other obvious more elegant solution. Because I am referencing more than one repo (no submodules) in different projects for my imports, it would be a huge pain in the ass to have to export them all into one place and do an env on the fly for building.Messner
thanks, do you know if this is still the case (no better way to exclude certain patterns)? I want to exclude basically what I have in my .gitignore from it.Tuneful
@Tuneful It probably still applies to your problem. But instead of providing a hard coded pattern like I did, you would have to read your git ignore file and apply each patternMessner
The latest py2app automatically filters .git, '.svn', ... but this solution helped me build a white-list.Tuneful
H
1

I can see two options for excluding the .git directory.

  1. Build the application from a 'clean' checkout of the code. When deploying a new version, we always build from a fresh svn export based on a tag to ensure we don't pick up spurious changes/files. You could try the equivalent here - although the git equivalent seems somewhat more involved.

  2. Modify the setup.py file to massage the files included in the application. This might be done using the exclude_package_data functionality as described in the docs, or build the list of data_files and pass it to setup.

As for why it has suddenly started happening, knowing the version of py2app you are using might help, as will knowing the contents of your setup.py and perhaps how this was made (by hand or using py2applet).

Helluva answered 29/3, 2012 at 5:8 Comment(2)
Thanks for the reply. The first option seems like a cumbersome layer that I would have to add to all of our py2app procedures. The project will one repo, and then it will tend to reference our standard library in the facility release location. There can be multiple repos in the mix and they aren't sub modules. So I would have to export and consolidate all of them.. For the second suggestion, I will look into this tomorrow and report and updates!Messner
I've updated my question with info after testing approach #2. Still can't get it to exclude :-(Messner
N
1

I have a similar experience with Pyinstaller, so I'm not sure it applies directly.

Pyinstaller creates a "manifest" of all files to be included in the distribution, before running the export process. You could "massage" this manifest, as per Mark's second suggestion, to exclude any files you want. Including anything within .git or .git itself.

In the end, I stuck with checking out my code before producing a binary as there was more than just .git being bloat (such as UML documents and raw resource files for Qt). A checkout guaranteed a clean result and I experienced no issues automating that process along with the process of creating the installer for the binary.

Ness answered 9/1, 2014 at 12:9 Comment(1)
Thanks Marcus. Pyinstaller doesn't use a distutils approach like py2app/py2exe so the solution would be different. I'm sure it would work, similar to what you said, instead of doing a git checkout, to actually do a git export from your current checkout to a temp location and then build from there. the git export would allow you to control the exact patterns but ya this was an approach I was also trying to avoid if it could be solved simply from the distutils setup script.Messner
T
1

There is a good answer to this, but I have a more elaborate answer to solve the problem mentioned here with a white-list approach. To have the monkey patch also work for packages outside site-packages.zip I had to monkey patch also copy_tree (because it imports copy_file inside its function), this helps in making a standalone application.

In addition, I create a white-list recipe to mark certain packages zip-unsafe. The approach makes it easy to add filters other than white-list.

import pkgutil
from os.path import join, dirname, realpath
from distutils import log

# file_util has to come first because dir_util uses it
from distutils import file_util, dir_util
# noinspection PyUnresolvedReferences
from py2app import util


def keep_only_filter(base_mod, sub_mods):
    prefix = join(realpath(dirname(base_mod.filename)), '')
    all_prefix = [join(prefix, sm) for sm in sub_mods]
    log.info("Set filter for prefix %s" % prefix)

    def wrapped(mod):
        name = getattr(mod, 'filename', None)
        if name is None:
            # ignore anything that does not have file name
            return True
        name = join(realpath(dirname(name)), '')
        if not name.startswith(prefix):
            # ignore those that are not in this prefix
            return True
        for p in all_prefix:
            if name.startswith(p):
                return True
        # log.info('ignoring %s' % name)
        return False
    return wrapped

# define all the filters we need
all_filts = {
    'mypackage': (keep_only_filter, [
        'subpackage1', 'subpackage2',
    ]),
}


def keep_only_wrapper(fn, is_dir=False):
    filts = [(f, k[1]) for (f, k) in all_filts.iteritems()
             if k[0] == keep_only_filter]
    prefixes = {}
    for f, sms in filts:
        pkg = pkgutil.get_loader(f)
        assert pkg, '{f} package not found'.format(f=f)
        p = join(pkg.filename, '')
        sp = [join(p, sm, '') for sm in sms]
        prefixes[p] = sp

    def wrapped(src, *args, **kwargs):
        name = src
        if not is_dir:
            name = dirname(src)
        name = join(realpath(name), '')
        keep = True
        for prefix, sub_prefixes in prefixes.iteritems():
            if name == prefix:
                # let the root pass
                continue
            # if it is a package we have a filter for
            if name.startswith(prefix):
                keep = False
                for sub_prefix in sub_prefixes:
                    if name.startswith(sub_prefix):
                        keep = True
                        break
        if keep:
            return fn(src, *args, **kwargs)
        return []

    return wrapped

file_util.copy_file = keep_only_wrapper(file_util.copy_file)
dir_util.mkpath = keep_only_wrapper(dir_util.mkpath, is_dir=True)
util.copy_tree = keep_only_wrapper(util.copy_tree, is_dir=True)


class ZipUnsafe(object):
    def __init__(self, _module, _filt):
        self.module = _module
        self.filt = _filt

    def check(self, dist, mf):
        m = mf.findNode(self.module)
        if m is None:
            return None

        # Do not put this package in site-packages.zip
        if self.filt:
            return dict(
                packages=[self.module],
                filters=[self.filt[0](m, self.filt[1])],
            )
        return dict(
            packages=[self.module]
        )

# Any package that is zip-unsafe (uses __file__ ,... ) should be added here 
# noinspection PyUnresolvedReferences
import py2app.recipes
for module in [
        'sklearn', 'mypackage',
]:
    filt = all_filts.get(module)
    setattr(py2app.recipes, module, ZipUnsafe(module, filt))
Tuneful answered 28/9, 2015 at 18:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.