Python: Good way to get function and all dependencies in a single file?
Asked Answered
G

5

13

I'm working on a big Python code base that grows and grows and grows. It's not a single application - more of a bunch of experiments that share some common code.

Every so often, I want to make a public release of a given experiment. I don't want to release my entire awful codebase, just the parts required to run a given experiment. So basically I'd like something to crawl through all the imports and copy whatever functions are called (or at least all the modules imported) into a single file, which I can release as a demo. I'd of course like to only do this for files defined in the current project (not a dependent package like numpy).

I'm using PyCharm now, and haven't been able to find that functionality. Is there any tool that does this?

Edit: I created the public-release package to solve this problem. Given a main module, it crawls through dependent modules and copies them into a new repo.

Gershon answered 7/11, 2016 at 16:10 Comment(2)
Are all your custom modules in a common directory (or subdirectories of a common directory)? And would you be happy to just have all the depended-upon modules packed into a single zip file, along with your script, or do you need to extract just the relevant code from those modules? It would be fairly easy to go through sys.modules and find the modules you use that are under a particular directory or directories, but it would be harder to extract just the subsections you need.Cove
Another option would be to run your experiment under a profiling tool that logs every function call. Then you could probably use a script to find the code for each function in each module file. If you usually import your functions using from mymodule import func, and you don't use any global variables or duplicate any function names, then you could probably safely gather all the functions into a single script. If you normally use import mymodule and then mymodule.func() then you could have the script create shell versions of each module with just the relevant functions.Cove
G
8

So in the end, to solve our problem, I made a tool called public-release, which collects all the dependencies for the module you want to release, throws them into a separate repo with setup scripts and all, so that you code can be easily run later.

Gershon answered 13/6, 2017 at 16:54 Comment(0)
V
7

Jamming all your code into a single module isn't a good idea. A good example reason why is the case when one of your experiments depends on two modules with different definitions for the same function name. With separate modules, it's easy for your code to distinguish between them; to stuff them in the same module, the editor would have to do some kind of hacky function renaming (e.g., prepend them with the old module name or something), and the situation gets even worse if some other function in the module calls the one with the conflicting name. You effectively have to fully replace the module scoping mechanism to do this.

Building a list of module dependencies is also a non-trival task. Consider having an experiment that depends on a module that depends on numpy. You almost certainly want your end users to actually install the numpy package rather than bundle it, so now the editor has to have some way of distinguishing what modules to include and which ones you expect to be installed some other way. On top of this, you have to consider things like when a function imports a module in line as opposed to at the top of your module and other out-of-the-ordinary cases.

You're asking too much of your editor. You really have two problems:

  1. Separate your experimental code from your release ready code.
  2. Package your stable code.

Separating your experimental code

Source control is the answer to your first problem. This will allow you to create whatever experimental code you wish on your local machine, and as long as you don't commit it, you won't pollute your code base with experimental code. If you do want to commit this code for back up, tracking, or sharing purposes, you can use branching here. Identify a branch as your stable branch (typically trunk in SVN and master in git), and only commit experimental code to other branches. You can then merge experimental feature branches into the stable branch as they become mature enough to publish. Such a branching set up has the added benefit of allowing you to segregate your experiments from each other, if you choose.

A server hosted source control system will generally make things simpler and safer, but if you're the sole developer, you could still use git locally without a server. A server hosted repository also makes it easier to coordinate with others if you're not the sole developer.

Packaging your stable code

One very simple option to consider is to just tell your users to check out the stable branch from the repository. Distributing this way is far from unheard of. This is still a little better than your current situation since you no longer need to manually gather all your files; you may need to do a little bit of documentation, though. You alternatively could use your source control's built in feature to check out an entire commit as a zip file or similar (export in SVN and archive in git) if you don't want to make your repository publicly available; these can be uploaded anywhere.

If that doesn't seem like enough and you can spare the time right now, setuptools is probably a good answer to this problem. This will allow you to generate a wheel containing your stable code. You can have a setup.py script for each package of code you want to release; the setup.py script will identify which packages and modules to include. You do have to manage this script manually, but if you configure it to include whole packages and directories and then establish good project conventions for organizing your code, you shouldn't have to change it very often. This also has the benefit of giving your end users a standard install mechanism for your code. You could even publish it on PyPI if you wish to share it broadly.

If you go so far as to use setuptools, you may also want to consider a build server, which can pick up on new commits and can run scripts to repackage and potentially publish your code.

Valueless answered 17/4, 2017 at 22:26 Comment(5)
Thanks for your detailed response. (1) Source control: I already use git, branching, etc. I think the problem is that I need to distinguish between "stable" code (which is big and has lots of unrelated stuff, but passes tests) and "release" code, which only includes the necessary modules required to run some function. I'm looking for an automated way to generate this "release" branch from the stable code. (2): Packaging. I will look into whether setuptools can help here, I hadn't thought of that.Gershon
@Gershon It's a worthwhile question to ask how much trouble you're willing to go. How hard would it be for your users to check out the whole branch and just delete the stuff they don't need? Do you have wide enough usage that the extra effort on their part would be too much? It might be enough for now to document what's required and what can be thrown away. If you do determine that's not enough, I think setuptools is going to be your best bet. As a widely used tool, it's very good at what it does and you're going to find the most info online, and it sounds to me that it suits your problem well.Valueless
@Gershon That said, if you go the setuptools route, you may need to reorganize your packages and modules to make the packaging simpler. This isn't a bad thing, but you should be aware it might involve some effort and a little trial and error to find an organization that works smoothly. Regardless, good luck with your project. =)Valueless
One issue is that the codebase is private and things have to be approved before public release. I will look more into how setuptools could help with this. Thanks.Gershon
@Gershon As I mentioned, you can distribute releases using git archive if you don't wish to make the repository itself publicly accessible. Using setuptools to generate packages bypasses the need to made the repository directly available, of course.Valueless
N
7

If you just want the modules, you could just run the code and a new session and go through sys.modules for any module in your package.

To move all the dependencies with PyCharm, you could make a macro that moves a highlighted object to a predefined file, attach the macro to a keyboard shortcut and then quickly move any in-project imports recursively. For instance, I made a macro called export_func that moves a function to to_export.py and added a shortcut to F10:

Macro Actions

Given a function that I want to move in a file like

from utils import factorize


def my_func():
    print(factorize(100))

and utils.py looking something like

import numpy as np
from collections import Counter
import sys
if sys.version_info.major >= 3:
    from functools import lru_cache
else:
    from functools32 import lru_cache


PREPROC_CAP = int(1e6)


@lru_cache(10)
def get_primes(n):
    n = int(n)
    sieve = np.ones(n // 3 + (n % 6 == 2), dtype=np.bool)
    for i in range(1, int(n ** 0.5) // 3 + 1):
        if sieve[i]:
            k = 3 * i + 1 | 1
            sieve[k * k // 3::2 * k] = False
            sieve[k * (k - 2 * (i & 1) + 4) // 3::2 * k] = False
    return list(map(int, np.r_[2, 3, ((3 * np.nonzero(sieve)[0][1:] + 1) | 1)]))


@lru_cache(10)
def _get_primes_set(n):
    return set(get_primes(n))


@lru_cache(int(1e6))
def factorize(value):
    if value == 1:
        return Counter()
    if value < PREPROC_CAP and value in _get_primes_set(PREPROC_CAP):
        return Counter([value])
    for p in get_primes(PREPROC_CAP):
        if p ** 2 > value:
            break
        if value % p == 0:
            factors = factorize(value // p).copy()
            factors[p] += 1
            return factors
    for p in range(PREPROC_CAP + 1, int(value ** .5) + 1, 2):
        if value % p == 0:
            factors = factorize(value // p).copy()
            factors[p] += 1
            return factors
    return Counter([value])

I can highlight my_func and press F10 to create to_export.py:

from utils import factorize


def my_func():
    print(factorize(100))

Highlighting factorize in to_export.py and hitting F10 gets

from collections import Counter
from functools import lru_cache

from utils import PREPROC_CAP, _get_primes_set, get_primes


def my_func():
    print(factorize(100))


@lru_cache(int(1e6))
def factorize(value):
    if value == 1:
        return Counter()
    if value < PREPROC_CAP and value in _get_primes_set(PREPROC_CAP):
        return Counter([value])
    for p in get_primes(PREPROC_CAP):
        if p ** 2 > value:
            break
        if value % p == 0:
            factors = factorize(value // p).copy()
            factors[p] += 1
            return factors
    for p in range(PREPROC_CAP + 1, int(value ** .5) + 1, 2):
        if value % p == 0:
            factors = factorize(value // p).copy()
            factors[p] += 1
            return factors
    return Counter([value])

Then highlighting each of PREPROC_CAP, _get_primes_set, and get_primes and then pressing F10 gets

from collections import Counter
from functools import lru_cache

import numpy as np


def my_func():
    print(factorize(100))


@lru_cache(int(1e6))
def factorize(value):
    if value == 1:
        return Counter()
    if value < PREPROC_CAP and value in _get_primes_set(PREPROC_CAP):
        return Counter([value])
    for p in get_primes(PREPROC_CAP):
        if p ** 2 > value:
            break
        if value % p == 0:
            factors = factorize(value // p).copy()
            factors[p] += 1
            return factors
    for p in range(PREPROC_CAP + 1, int(value ** .5) + 1, 2):
        if value % p == 0:
            factors = factorize(value // p).copy()
            factors[p] += 1
            return factors
    return Counter([value])


PREPROC_CAP = int(1e6)


@lru_cache(10)
def _get_primes_set(n):
    return set(get_primes(n))


@lru_cache(10)
def get_primes(n):
    n = int(n)
    sieve = np.ones(n // 3 + (n % 6 == 2), dtype=np.bool)
    for i in range(1, int(n ** 0.5) // 3 + 1):
        if sieve[i]:
            k = 3 * i + 1 | 1
            sieve[k * k // 3::2 * k] = False
            sieve[k * (k - 2 * (i & 1) + 4) // 3::2 * k] = False
    return list(map(int, np.r_[2, 3, ((3 * np.nonzero(sieve)[0][1:] + 1) | 1)]))

It goes pretty fast even if you have a lot of code that you're copying over.

Neurath answered 18/4, 2017 at 19:16 Comment(3)
Great, this is very helpful, and the sys.modules thing is a good tip. Much thanks. Will award bounty unless some miracle answer comes alongGershon
Am I understanding this right if it means you're just manually clicking each function and telling PyCharm to copy it?Valueless
Yep, looks like it. Though a fully automated "crawl through dependencies and copy" would be best, it looks like this is the quickest alternative.Gershon
L
1

Unfortunately, the dynamic features of Python makes it impossible in general. (For example you can call functions by names which come from an arbitrary source.)

You can thinking in the opposite way, which means that you should remove all unused parts of the code.

According to this question PyCharm does not support this. The vulture package provides dead code detection functionality.

Therefor, I propose to make a copy of the project where you collect the required functions into a module. After, detect all unused parts of the demo code and remove them.

Loony answered 17/4, 2017 at 20:58 Comment(0)
G
0

In PyCharm you can select the code you wish to move into a new module and from the main menu select - Refactor -> Copy (F6 on mine but I can't remember if thats a customised shortcut). This give you the option to copy the code to a new (or existing file) in a directory of your choosing. It will also add all the relevant imports.

Greaser answered 7/11, 2016 at 16:18 Comment(1)
This is not what I'm looking for. I want to copy a function and all the functions it uses into a single file, to release as a separate project.Gershon

© 2022 - 2024 — McMap. All rights reserved.