Mercurial hook to disallow committing large binary files
Asked Answered
G

2

10

I want to have a Mercurial hook that will run before committing a transaction that will abort the transaction if a binary file being committed is greater than 1 megabyte. I found the following code which works fine except for one problem. If my changeset involves removing a file, this hook will throw an exception.

The hook (I'm using pretxncommit = python:checksize.newbinsize):

from mercurial import context, util
from mercurial.i18n import _
import mercurial.node as dpynode

'''hooks to forbid adding binary file over a given size

Ensure the PYTHONPATH is pointing where hg_checksize.py is and setup your
repo .hg/hgrc like this:

[hooks]
pretxncommit = python:checksize.newbinsize
pretxnchangegroup = python:checksize.newbinsize
preoutgoing = python:checksize.nopull

[limits]
maxnewbinsize = 10240
'''

def newbinsize(ui, repo, node=None, **kwargs):
    '''forbid to add binary files over a given size'''
    forbid = False
    # default limit is 10 MB
    limit = int(ui.config('limits', 'maxnewbinsize', 10000000))
    tip = context.changectx(repo, 'tip').rev()
    ctx = context.changectx(repo, node)
    for rev in range(ctx.rev(), tip+1):
        ctx = context.changectx(repo, rev)
        print ctx.files()
        for f in ctx.files():
            fctx = ctx.filectx(f)
            filecontent = fctx.data()
            # check only for new files
            if not fctx.parents():
                if len(filecontent) > limit and util.binary(filecontent):
                    msg = 'new binary file %s of %s is too large: %ld > %ld\n'
                    hname = dpynode.short(ctx.node())
                    ui.write(_(msg) % (f, hname, len(filecontent), limit))
                    forbid = True
    return forbid

The exception:

$  hg commit -m 'commit message'
error: pretxncommit hook raised an exception: apps/helpers/templatetags/include_extends.py@bced6272d8f4: not found in manifest
transaction abort!
rollback completed
abort: apps/helpers/templatetags/include_extends.py@bced6272d8f4: not found in manifest!

I'm not familiar with writing Mercurial hooks, so I'm pretty confused about what's going on. Why does the hook care that a file was removed if hg already knows about it? Is there a way to fix this hook so that it works all the time?

Update (solved): I modified the hook to filter out files that were removed in the changeset.

def newbinsize(ui, repo, node=None, **kwargs):
    '''forbid to add binary files over a given size'''
    forbid = False
    # default limit is 10 MB
    limit = int(ui.config('limits', 'maxnewbinsize', 10000000))
    ctx = repo[node]
    for rev in xrange(ctx.rev(), len(repo)):
        ctx = context.changectx(repo, rev)

        # do not check the size of files that have been removed
        # files that have been removed do not have filecontexts
        # to test for whether a file was removed, test for the existence of a filecontext
        filecontexts = list(ctx)
        def file_was_removed(f):
            """Returns True if the file was removed"""
            if f not in filecontexts:
                return True
            else:
                return False

        for f in itertools.ifilterfalse(file_was_removed, ctx.files()):
            fctx = ctx.filectx(f)
            filecontent = fctx.data()
            # check only for new files
            if not fctx.parents():
                if len(filecontent) > limit and util.binary(filecontent):
                    msg = 'new binary file %s of %s is too large: %ld > %ld\n'
                    hname = dpynode.short(ctx.node())
                    ui.write(_(msg) % (f, hname, len(filecontent), limit))
                    forbid = True
    return forbid
Germanophile answered 31/3, 2010 at 9:24 Comment(0)
L
4

for f in ctx.files() will include removed files, you need to filter those out.

(and you can replace for rev in range(ctx.rev(), tip+1): by for rev in xrange(ctx.rev(), len(repo)):, and remove tip = ...)

If you're using a modern hg, you don't do ctx = context.changectx(repo, node) but ctx = repo[node] instead.

Leupold answered 31/3, 2010 at 10:37 Comment(2)
How do I filter the removed files out of ctx.files()?Germanophile
catching the exception is sufficient ;)Leupold
I
5

This is really easy to do in a shell hook in recent Mercurial:

if hg locate -r tip "set:(added() or modified()) and binary() and size('>100k')"; then
  echo "bad files!"
  exit 1
else
  exit 0
fi

What's going on here? First we have a fileset to find all the changed files that are problematic (see 'hg help filesets' in hg 1.9). The 'locate' command is like status, except it just lists files and returns 0 if it finds anything. And we specify '-r tip' to look at the pending commit.

Impeach answered 14/10, 2011 at 18:52 Comment(0)
L
4

for f in ctx.files() will include removed files, you need to filter those out.

(and you can replace for rev in range(ctx.rev(), tip+1): by for rev in xrange(ctx.rev(), len(repo)):, and remove tip = ...)

If you're using a modern hg, you don't do ctx = context.changectx(repo, node) but ctx = repo[node] instead.

Leupold answered 31/3, 2010 at 10:37 Comment(2)
How do I filter the removed files out of ctx.files()?Germanophile
catching the exception is sufficient ;)Leupold

© 2022 - 2024 — McMap. All rights reserved.