Is the Mercurial .hgignore my only option for handling hundreds of temp files generated when compiling?

Asked 15/9, 2009 at 21:36 Answered 17/9, 2009 at 2:29

version-control mercurial build-process hgignore

I've been all over google and SO looking for someone who has asked this question, but am coming up completely empty. I'll apologize in advance for the lengthy round-about way of asking the question. (If I was able to figure out how to encapsulate the problem, maybe I would have been successful in finding an answer.)

How are large projects managed in Mercurial, when the act of building / compiling generates hundreds of temporary files in order to create the end result?? Is .hgignore the only answer?

Example Scenario:

You have a project that wants to use some open source package for some feature, and needs to compile from source. So you go get the package. un-.tgz it and then slap it into its own Mercurial repository so you can then start tracking changes. Then you make all your changes, and run a build.

You test your end result, are happy with the results and are ready to commit back to your local clone of the repository. So you do an hg status to check your changes prior to committing The hg status results cause you to immediately start using all those words that would make your mother ashamed — because you now have screens and screens of "build cruft".

For the sake of argument say this package is MySQL or Apache: something that

you don't control and will be changing regularly,
leaves a whole lot of cruft behind in a whole lot of places, and
there is no guarantee the cruft won't change each time you get a new version from the external source.

Wow what? The particular project causing this angst is going to be worked on by multiple developers in multiple physical locations, and so needs to be as straightforward as possible. If there is too much involved they're not going to do it, and we'll have a bigger problem on our hands. (Sadly, some old dogs are not keen on learning new tricks...)

One proposed solution was that they would just have to commit everything locally before doing a make, so they have a "clean slate" they would then have to clone from to actually do the build in. That got shot down as (a) too many steps, and (b) not wanting to cruft up the history with a bunch of "time to build now" changesets.

Someone else has proposed that all the cruft just be committed into the Mercurial repository. I am strongly against that because then the next time around those files will turn up as "modified" and therefore be included in the changeset's file list.

We can't possibly be the only people who have run into this problem. So what is the "right" solution? Is our only recourse to try create a massively intelligent .hginore file? This makes me uneasy, because if I tell Mercurial to "ignore everything in this directory I haven't already told you about", then what happens if the next applied patch adds files into that ignored directory? (Mercurial will never see that new file, right?)

Hopefully this is not a completely stupid question with an obvious answer. I've compiled things from source many times before, but have never needed to apply version control on top of that. Plus we're new to Mercurial.

Pusher answered 15/9, 2009 at 21:36 Comment(8)

You mention Apache and PHP causing a lot of cruft to appear - are there changes you are making to those packages that do need to be checked in? – Munday 15/9, 2009 at 22:51

I have no idea, that is outside my relm. Something about some custom library or some such to make the app talk to a lower level of the system or some magic voodoo like that. I've been tasked with figuring out how to "make things work in Mercurial", and I have run into this "build cruft" snafu while pursuing that. Thx. – Pusher 15/9, 2009 at 23:4

What bothers you so much about .hgignore? It exists for this reason – Aer 16/9, 2009 at 13:48

Oh I have no problem about .hgignore at all -- I'm just extremely cautious and like to know all my options and so wanted to know how other people deal with this problem. I don't want to accidentally have an "over-reaching" RegExp in the .hgignore that could make important things (that might get added later in a package update) not be noticed, and thus not get added, so that when someone else tries to check it out, it won't build / run properly. ("Well it works on MY machine!" is not an easy problem to track down... At least in my experience.) – Pusher 16/9, 2009 at 15:55

JNeefer, I am just curious? Why did you have to put the source of the open source project into your project? Could you just compile the binaries and put them into your source? Just trying to learn something new here. Seth – Eusporangiate 19/7, 2010 at 15:14

@Pusher did Martin's solution (or any other one) end up working for you on this? If so, you should pop in and mark the best answer as accepted. – Plast 19/7, 2010 at 15:21

@Seth - "They" don't like just having the compiled binaries around here, they want the source, so they know what the binary was built from and have it available should they need to recompile with different options. – Pusher 6/10, 2010 at 21:15

@Chris - The solution that f-i-n-a-l-l-y ended up being implemented was a two-tier Mercurial structure (3 tier if you count the remote master), wherein there is a primary local clone where the work is done, and then that one is "subcloned" and the build is done in that second copy. It does require the work be committed locally so that it can be "picked up" by the subclone, but it gets the job done. (I personally also use the "collapse" extension for Mercurial, so when I'm done I can smoosh all my local commits into one changeset to push back up to the master server.) – Pusher 6/10, 2010 at 21:20

Two options:

The best option is to do an out of tree build, if you can. This is a build where you place the object files outside of the source tree. Some build systems, such as CMake, support this directly. For other systems, you need to be lucky since the upstream project must have added support for this in their Makefile or similar.
A more general option is to tell Mercurial to ignore specific types of files, not entire directories. This works well in my experience.

To test the second option, I wanted to compile Apache. However, it requires APR, so I tested with that instead. After checking in a clean apr-1.3.8.tar.bz2 I did ./configure; make and looked at the output of hg status. The first few pattens were easy:

syntax: glob

*~
*.o
*.lo
*.la
*.so
.libs/*

The remaining new files look like they are specific files generated by the build process. It's easy to add them too:

% hg status --unknown --no-status >> .hgignore

That also added .hgignore since I hadn't yet scheduled it for addition. Removing that I ended up with this .hgignore file:

syntax: glob

*~
*.o
*.lo
*.la
*.so
.libs/*
.make.dirs
Makefile
apr-1-config
apr-config.out
apr.exp
apr.pc
build/apr_rules.mk
build/apr_rules.out
build/pkg/pkginfo
config.log
config.nice
config.status
export_vars.c
exports.c
include/apr.h
include/arch/unix/apr_private.h
libtool
test/Makefile
test/internal/Makefile

I consider this a quite robust way to go about this in Mercurial or any other revision control system for that matter.

Flexion answered 16/9, 2009 at 10:40 Comment(3)

Martin, this is exactly the thing. I didn't think that full path specs work (and the .hgignore doc is a bit vague on the subject) but it clearly does at least as of Hg 1.3.1. JNeefer, this is your solution! – Plast 16/9, 2009 at 13:31

Chris R: yeah, the .hgignore documentation is a bit thin. I think the crucial part is this sentence: "For example, say we have an an untracked file, file.c, at a/b/file.c inside our repository. Mercurial will ignore file.c if any pattern in .hgignore matches a/b/file.c, a/b or a." (from selenic.com/mercurial/hgignore.5.html) – Flexion 17/9, 2009 at 6:55

Thanks for doing the detailed work and showing the results here - that's an excellent answer that deserves a check mark, imho. – Hinojosa 2/10, 2010 at 13:10

The best solution would be to fix the build process so that it behaves in a 'nice' manner.. namely allowing you to specify some separate directory to store intermediate files in (that could then be completely ignored via a very simple .hgignore entry... or not even within the version-controlled directory structure at all.

Henriettahenriette answered 15/9, 2009 at 21:41 Comment(1)

See my comment below to Will. We don't maintain the build process of the open source software causing the problem. So while we technically could "fix" the problem, it's really not ideal to have to be making (and maintaining) such modifications to things like apache or php... :-/ – Pusher 15/9, 2009 at 22:36

For what it's worth, I've found that in this situation a smart .hgignore is the only solution that has worked for me so far. With the inclusion of regular expression support, it's very powerful, but tricky, too, since a pattern that is cruft in one directory may well be source in another.

At least you can check in the .hgignore and share it with your developers. That way the work is only done once.

[Edit] At least, however, it's possible -- as noted above by Martin Geisler -- to have full path specifications in your .hgignore file; you can, therefore, have test/Makefile in the .hgignore and still have Mercurial notice a new test2/Makefile

His process for creating the file should give you almost what you want, and you can tune it from there.

Plast answered 15/9, 2009 at 21:42 Comment(1)

That's exactly the problem (patterns to match cruft in one dir would match things that are source in another dir). But on top of that "simple" problem, is the sheer quantity of the cruft. Running "hg status | wc -l" (on the dir that has made no changes and done one build) shows over 7500 individual pieces of cruft spread as far as 4 levels deep. I need to find some way to keep this cruft from getting checked-in, while not losing new things / changes to come with package updates. Arg! – Pusher 15/9, 2009 at 22:44

One option you have is to clean your working directory after verifying a build.

make clean
hg status

Of course you may not want to clean your project if it takes more than a few minutes to build.

Munday answered 15/9, 2009 at 22:8 Comment(1)

The problem isn't with our own software. It builds nicely and isolates any cruft. The problem is with packages originating outside of us. As soon as I the maintainers of things like apache and php to make their 'make clean' actually 'clean' then 75% of my problem will be solved. But since they are generating the cruft, I'm stuck. (And yes, the project takes more than 'a few' minutes to build. – Pusher 15/9, 2009 at 22:29

If the files you want to track are already known to hg, you can hgignore everything. Then you need to use hg import to add patch, and not just use the patch command (since hg needs to be aware if some new files should be tracked).

Sakhuja answered 15/9, 2009 at 22:43 Comment(2)

Hrm... That sounds promising! I will go read up on Mercurial's 'import' versus 'patch' and see if that is a workable solution for this situation. I will post a follow up when I know more. Thank you! – Pusher 15/9, 2009 at 22:49

OK I am now on the track of Mercurial Queues, is that what you had in mind? The first page of Ch.12 of the O'Reilly book on Mercurial, it describes part of my issue: "You have an 'upstream' source tree that you can't change; you need to make some local changes on top of the upstream tree; and you'd like to be able to keep those changes separate, so that you can apply them to newer versions of the upstream source." So is the thought that by using this "different" way to manage the files/changes (hg import?), the cruft can stay in the working dir by having the WHOLE working dir .hgignore'd? – Pusher 15/9, 2009 at 23:25

How about a shell (or whatever) script that walks your build directory recursively, finds every file created after your build process started running, and moves all these files (of course, you can specify the exceptions) into a cruft_dir subdirectory. Then you can just put cruft_dir/* in .hgignore.

EDIT: I forgot to add, but this is fairly obvious, that this shell script runs automatically as soon as your build finishes. Maybe it's even called as the last command in your Makefile/ant/whatever file.

Martinson answered 17/9, 2009 at 2:29 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags