Best way to have a python script copy itself?
Asked Answered
S

3

8

I am using python for scientific applications. I run simulations with various parameters, my script outputs the data to an appropriate directory for that parameter set. Later I use that data. However sometimes I edit my script; in order to be able to reproduce my results if needed I would like to have a copy of whatever version of the script was used to generate the data live right in the directory with the data. So basically I would like to have my python script copy itself to the data directory. What's the best way to do this?

Thanks!

Stinger answered 27/4, 2014 at 8:27 Comment(9)
Ah I see. You want to make a virus? :)Transmogrify
Why not use a VCS and store the revision identifier of the script with the data?Hustler
This sounds like an XY problem. There isn't anything in your problem description that yells "script that copies itself" out to me. Perhaps have a base script, that takes in some configuration parameters /config files, and have another script that launches it, and takes care of copying it and its configuration somewhere else for posterity (the VCS suggestion by @MartijnPieters can supplement this. It is common to store, say a git repo and tag with the output of a production job, so you can go back and replicate it any time in the future.)Arevalo
That would work I guess. But I'm not super familiar with VCS. I guess maybe it's time to get acquainted, but I would like to just solve the problem this way in the meantime.Stinger
But if you insist: Python scripts have a __file__ global; you can use that with shutil.copy() to create a copy of the file somewhere.Hustler
I had exactly the same problem like you. The solution I came up with was to store the parameters inside a json-file which can communicate with a parameter class inside the python script. Each time i do a simulation the parameter class creates a json file which is stored together with the simulation data.Nichol
@Arevalo okay maybe its time to learn about git. I'm just usually more interested in science than coding so I just learn enough to get by which is why I put it off for this long. ;^)Stinger
Git was just an example (because it is what I am using day to day now), but there are alternatives. Try and see what is used/supported in your work place. I used to do science, and I can tell you it is well worth figuring these things out as early as possible. It will really make your life easier in the long run, giving you more time do do good science.Arevalo
And if you end up using 'git' and start using matplotlib someday, you may find this useful:github.com/dfm/savefigInna
B
5

Copying the script can be done with shutil.copy().

But you should consider keeping your script under revision control. That enables you to retain a revision history.

E.g. I keep my scripts under revision control with git. In Python files I tend to keep a version string like this;

__version__ = '$Revision: a42ef58 $'[11:-2]

This version string is updated with the git short hash tag every time the file in question is changed. (this is done by running a script called update-modified-keywords.py from git's post-commit hook.)

If you have a version string like this, you can embed that in the output, so you always know which version has produced the output.

Edit:

The update-modified-keywords script is shown below;

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <[email protected]>
# $Date: 2013-11-24 22:20:54 +0100 $
# $Revision: 3d4f750 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to update-modified-keywords.py. This work is
# published from the Netherlands.
# See http://creativecommons.org/publicdomain/zero/1.0/

"""Remove and check out those files that that contain keywords and have
changed since in the last commit in the current working directory."""

from __future__ import print_function, division
import os
import mmap
import sys
import subprocess


def checkfor(args):
    """Make sure that a program necessary for using this script is
    available.

    Arguments:
    args -- string or list of strings of commands. A single string may
            not contain spaces.
    """
    if isinstance(args, str):
        if ' ' in args:
            raise ValueError('No spaces in single command allowed.')
        args = [args]
    try:
        with open(os.devnull, 'w') as bb:
            subprocess.check_call(args, stdout=bb, stderr=bb)
    except subprocess.CalledProcessError:
        print("Required program '{}' not found! exiting.".format(args[0]))
        sys.exit(1)


def modifiedfiles():
    """Find files that have been modified in the last commit.

    :returns: A list of filenames.
    """
    fnl = []
    try:
        args = ['git', 'diff-tree', 'HEAD~1', 'HEAD', '--name-only', '-r',
                '--diff-filter=ACMRT']
        with open(os.devnull, 'w') as bb:
            fnl = subprocess.check_output(args, stderr=bb).splitlines()
            # Deal with unmodified repositories
            if len(fnl) == 1 and fnl[0] is 'clean':
                return []
    except subprocess.CalledProcessError as e:
        if e.returncode == 128:  # new repository
            args = ['git', 'ls-files']
            with open(os.devnull, 'w') as bb:
                fnl = subprocess.check_output(args, stderr=bb).splitlines()
    # Only return regular files.
    fnl = [i for i in fnl if os.path.isfile(i)]
    return fnl


def keywordfiles(fns):
    """Filter those files that have keywords in them

    :fns: A list of filenames
    :returns: A list for filenames for files that contain keywords.
    """
    # These lines are encoded otherwise they would be mangled if this file
    # is checked in my git repo!
    datekw = 'JERhdGU='.decode('base64')
    revkw = 'JFJldmlzaW9u'.decode('base64')
    rv = []
    for fn in fns:
        with open(fn, 'rb') as f:
            try:
                mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
                if mm.find(datekw) > -1 or mm.find(revkw) > -1:
                    rv.append(fn)
                mm.close()
            except ValueError:
                pass
    return rv


def main(args):
    """Main program.

    :args: command line arguments
    """
    # Check if git is available.
    checkfor(['git', '--version'])
    # Check if .git exists
    if not os.access('.git', os.F_OK):
        print('No .git directory found!')
        sys.exit(1)
    print('{}: Updating modified files.'.format(args[0]))
    # Get modified files
    files = modifiedfiles()
    if not files:
        print('{}: Nothing to do.'.format(args[0]))
        sys.exit(0)
    files.sort()
    # Find files that have keywords in them
    kwfn = keywordfiles(files)
    for fn in kwfn:
        os.remove(fn)
    args = ['git', 'checkout', '-f'] + kwfn
    subprocess.call(args)


if __name__ == '__main__':
    main(sys.argv)

If you don't want keyword expansion to clutter up your git history, you can use the smudge and clean filters. I have the following set in my ~/.gitconfig;

[filter "kw"]
    clean = kwclean
    smudge = kwset

Both kwclean and kwset are Python scripts.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <[email protected]>
# $Date: 2013-11-24 22:20:54 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to kwset.py. This work is published from
# the Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/

"""Fill the Date and Revision keywords from the latest git commit and tag and
   subtitutes them in the standard input."""

import os
import sys
import subprocess
import re


def gitdate():
    """Get the date from the latest commit in ISO8601 format.
    """
    args = ['git', 'log',  '-1', '--date=iso']
    dline = [l for l in subprocess.check_output(args).splitlines()
             if l.startswith('Date')]
    try:
        dat = dline[0][5:].strip()
        return ''.join(['$', 'Date: ', dat, ' $'])
    except IndexError:
        raise ValueError('Date not found in git output')


def gitrev():
    """Get the latest tag and use it as the revision number. This presumes the
    habit of using numerical tags. Use the short hash if no tag available.
    """
    args = ['git', 'describe',  '--tags', '--always']
    try:
        with open(os.devnull, 'w') as bb:
            r = subprocess.check_output(args, stderr=bb)[:-1]
    except subprocess.CalledProcessError:
        return ''.join(['$', 'Revision', '$'])
    return ''.join(['$', 'Revision: ', r, ' $'])


def main():
    """Main program.
    """
    dre = re.compile(''.join([r'\$', r'Date:?\$']))
    rre = re.compile(''.join([r'\$', r'Revision:?\$']))
    currp = os.getcwd()
    if not os.path.exists(currp+'/.git'):
        print >> sys.stderr, 'This directory is not controlled by git!'
        sys.exit(1)
    date = gitdate()
    rev = gitrev()
    for line in sys.stdin:
        line = dre.sub(date, line)
        print rre.sub(rev, line),


if __name__ == '__main__':
    main()

and

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <[email protected]>
# $Date: 2013-11-24 22:20:54 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to kwclean.py. This work is published from the
# Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/

"""Remove the Date and Revision keyword contents from the standard input."""

import sys
import re

## This is the main program ##
if __name__ == '__main__':
    dre = re.compile(''.join([r'\$', r'Date.*\$']))
    drep = ''.join(['$', 'Date', '$'])
    rre = re.compile(''.join([r'\$', r'Revision.*\$']))
    rrep = ''.join(['$', 'Revision', '$'])
    for line in sys.stdin:
        line = dre.sub(drep, line)
        print rre.sub(rrep, line),

Both of these scripts are installed (without an extension at the end of the filename, as usual for executables) in a directory that is in my $PATH, and have their executable bit set.

In the .gitattributes file of my repository I choose for which files I want keyword expansion. So for e.g. Python files;

*.py filter=kw
Bignoniaceous answered 27/4, 2014 at 8:32 Comment(2)
I implemented your suggestion. So I guess you just have to live with the fact that your script will look modified if you for example run git status?Stinger
@KaiSikorski Not necessarily. If you employ the kwset and kwclean filters as the smudge and * clean* filters as shown in the updated answer, you can have up-to-date keywords in your working directory without screwing up your commit history.Bignoniaceous
G
12

I stumbled across this question as I wanted to do the same thing. Although I agree with the comments that git/VCS with revision and everything would be the cleanest solution, sometimes you just want something quick and dirty that does the job. So if anyone is still interested:

With __file__ you can access the running scripts filename (with path), and as already suggested you can use a high-level file manipulation lib like shutil to copy it to some place. In one line:

shutil.copy(__file__, 'experiment_folder_path/copied_script_name.py') 

With the corresponding imports and some bells and whistles:

import shutil
import os     # optional: for extracting basename / creating new filepath
import time   # optional: for appending time string to copied script

# generate filename with timestring
copied_script_name = time.strftime("%Y-%m-%d_%H%M") + '_' + os.path.basename(__file__)

# copy script
shutil.copy(__file__, 'my_experiment_folder_path' + os.sep + copied_script_name) 
Galluses answered 10/3, 2018 at 15:29 Comment(0)
B
5

Copying the script can be done with shutil.copy().

But you should consider keeping your script under revision control. That enables you to retain a revision history.

E.g. I keep my scripts under revision control with git. In Python files I tend to keep a version string like this;

__version__ = '$Revision: a42ef58 $'[11:-2]

This version string is updated with the git short hash tag every time the file in question is changed. (this is done by running a script called update-modified-keywords.py from git's post-commit hook.)

If you have a version string like this, you can embed that in the output, so you always know which version has produced the output.

Edit:

The update-modified-keywords script is shown below;

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <[email protected]>
# $Date: 2013-11-24 22:20:54 +0100 $
# $Revision: 3d4f750 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to update-modified-keywords.py. This work is
# published from the Netherlands.
# See http://creativecommons.org/publicdomain/zero/1.0/

"""Remove and check out those files that that contain keywords and have
changed since in the last commit in the current working directory."""

from __future__ import print_function, division
import os
import mmap
import sys
import subprocess


def checkfor(args):
    """Make sure that a program necessary for using this script is
    available.

    Arguments:
    args -- string or list of strings of commands. A single string may
            not contain spaces.
    """
    if isinstance(args, str):
        if ' ' in args:
            raise ValueError('No spaces in single command allowed.')
        args = [args]
    try:
        with open(os.devnull, 'w') as bb:
            subprocess.check_call(args, stdout=bb, stderr=bb)
    except subprocess.CalledProcessError:
        print("Required program '{}' not found! exiting.".format(args[0]))
        sys.exit(1)


def modifiedfiles():
    """Find files that have been modified in the last commit.

    :returns: A list of filenames.
    """
    fnl = []
    try:
        args = ['git', 'diff-tree', 'HEAD~1', 'HEAD', '--name-only', '-r',
                '--diff-filter=ACMRT']
        with open(os.devnull, 'w') as bb:
            fnl = subprocess.check_output(args, stderr=bb).splitlines()
            # Deal with unmodified repositories
            if len(fnl) == 1 and fnl[0] is 'clean':
                return []
    except subprocess.CalledProcessError as e:
        if e.returncode == 128:  # new repository
            args = ['git', 'ls-files']
            with open(os.devnull, 'w') as bb:
                fnl = subprocess.check_output(args, stderr=bb).splitlines()
    # Only return regular files.
    fnl = [i for i in fnl if os.path.isfile(i)]
    return fnl


def keywordfiles(fns):
    """Filter those files that have keywords in them

    :fns: A list of filenames
    :returns: A list for filenames for files that contain keywords.
    """
    # These lines are encoded otherwise they would be mangled if this file
    # is checked in my git repo!
    datekw = 'JERhdGU='.decode('base64')
    revkw = 'JFJldmlzaW9u'.decode('base64')
    rv = []
    for fn in fns:
        with open(fn, 'rb') as f:
            try:
                mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
                if mm.find(datekw) > -1 or mm.find(revkw) > -1:
                    rv.append(fn)
                mm.close()
            except ValueError:
                pass
    return rv


def main(args):
    """Main program.

    :args: command line arguments
    """
    # Check if git is available.
    checkfor(['git', '--version'])
    # Check if .git exists
    if not os.access('.git', os.F_OK):
        print('No .git directory found!')
        sys.exit(1)
    print('{}: Updating modified files.'.format(args[0]))
    # Get modified files
    files = modifiedfiles()
    if not files:
        print('{}: Nothing to do.'.format(args[0]))
        sys.exit(0)
    files.sort()
    # Find files that have keywords in them
    kwfn = keywordfiles(files)
    for fn in kwfn:
        os.remove(fn)
    args = ['git', 'checkout', '-f'] + kwfn
    subprocess.call(args)


if __name__ == '__main__':
    main(sys.argv)

If you don't want keyword expansion to clutter up your git history, you can use the smudge and clean filters. I have the following set in my ~/.gitconfig;

[filter "kw"]
    clean = kwclean
    smudge = kwset

Both kwclean and kwset are Python scripts.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <[email protected]>
# $Date: 2013-11-24 22:20:54 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to kwset.py. This work is published from
# the Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/

"""Fill the Date and Revision keywords from the latest git commit and tag and
   subtitutes them in the standard input."""

import os
import sys
import subprocess
import re


def gitdate():
    """Get the date from the latest commit in ISO8601 format.
    """
    args = ['git', 'log',  '-1', '--date=iso']
    dline = [l for l in subprocess.check_output(args).splitlines()
             if l.startswith('Date')]
    try:
        dat = dline[0][5:].strip()
        return ''.join(['$', 'Date: ', dat, ' $'])
    except IndexError:
        raise ValueError('Date not found in git output')


def gitrev():
    """Get the latest tag and use it as the revision number. This presumes the
    habit of using numerical tags. Use the short hash if no tag available.
    """
    args = ['git', 'describe',  '--tags', '--always']
    try:
        with open(os.devnull, 'w') as bb:
            r = subprocess.check_output(args, stderr=bb)[:-1]
    except subprocess.CalledProcessError:
        return ''.join(['$', 'Revision', '$'])
    return ''.join(['$', 'Revision: ', r, ' $'])


def main():
    """Main program.
    """
    dre = re.compile(''.join([r'\$', r'Date:?\$']))
    rre = re.compile(''.join([r'\$', r'Revision:?\$']))
    currp = os.getcwd()
    if not os.path.exists(currp+'/.git'):
        print >> sys.stderr, 'This directory is not controlled by git!'
        sys.exit(1)
    date = gitdate()
    rev = gitrev()
    for line in sys.stdin:
        line = dre.sub(date, line)
        print rre.sub(rev, line),


if __name__ == '__main__':
    main()

and

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <[email protected]>
# $Date: 2013-11-24 22:20:54 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to kwclean.py. This work is published from the
# Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/

"""Remove the Date and Revision keyword contents from the standard input."""

import sys
import re

## This is the main program ##
if __name__ == '__main__':
    dre = re.compile(''.join([r'\$', r'Date.*\$']))
    drep = ''.join(['$', 'Date', '$'])
    rre = re.compile(''.join([r'\$', r'Revision.*\$']))
    rrep = ''.join(['$', 'Revision', '$'])
    for line in sys.stdin:
        line = dre.sub(drep, line)
        print rre.sub(rrep, line),

Both of these scripts are installed (without an extension at the end of the filename, as usual for executables) in a directory that is in my $PATH, and have their executable bit set.

In the .gitattributes file of my repository I choose for which files I want keyword expansion. So for e.g. Python files;

*.py filter=kw
Bignoniaceous answered 27/4, 2014 at 8:32 Comment(2)
I implemented your suggestion. So I guess you just have to live with the fact that your script will look modified if you for example run git status?Stinger
@KaiSikorski Not necessarily. If you employ the kwset and kwclean filters as the smudge and * clean* filters as shown in the updated answer, you can have up-to-date keywords in your working directory without screwing up your commit history.Bignoniaceous
M
2

If you are using Linux, you could use the following.

import os
os.system("cp ./scriptname ./")
Mosra answered 27/4, 2014 at 8:34 Comment(2)
simple and sweet !Supercharge
@Cs20 Looking back on this, it seems suspiciously like a fork bomb. Oh wellMosra

© 2022 - 2024 — McMap. All rights reserved.