Install data directory tree with massive number of files using automake
Asked Answered
M

3

16

I have a data directory which I would like automake to generate install and uninstall targets for. Essentially, I just want to copy this directory verbatim to the DATA directory, Normally, I might list all the files individually, like

dist_whatever_DATA=dir/subdir/filea ...

But the problem arises when my directory structure looks like this

*root
 *subdir
  *~10 files
 *subdir
  *~10 files
 *subdir
  *~700 files
 *subdir
 ...
 ~20 subdirs

I just cannot list all 1000+ files included as part of my Makefile.am. That would be ridiculous.

I need to preserve the directory structure as well. I should note that this data is not generated at all by the build process, and is actually largely short audio recordings. So it's not like I would want automake to "check" that every file I want to install has actually been created, as they're either there or not, and whatever file is there, I know I want it to be installed, and whatever file is not, should not be installed. I know that this is the justification used in other places to not do wildcard instsalls, but all the possible reasons don't apply here.

Murdoch answered 18/6, 2011 at 9:41 Comment(0)
D
10

I've found that installing hundreds of files separately makes for a tormentingly long invocation of make install. I had a similar case where I wanted to install hundreds of files, preserving the directory structure, and I did not want to change my Makefile.am every time a file was added to or removed from the collection.

I included a LZMA archive of the files in my distribution, and made automake rules like so:

GIANTARCHIVE = My_big_archive.tar.lz

dist_pkgdata_DATA = $(GIANTARCHIVE)

install-data-hook:
    cd $(DESTDIR)$(pkgdatadir); \
    cat $(GIANTARCHIVE) | unlzma | $(TAR) --list > uninstall_manifest.txt; \
    cat $(GIANTARCHIVE) | unlzma | $(TAR) --no-same-owner --extract; \
    rm --force $(GIANTARCHIVE); \
    cat uninstall_manifest.txt | sed --expression='s/^\|$$/"/g' | xargs chmod a=rX,u+w

uninstall-local:
    cd $(DESTDIR)$(pkgdatadir); \
    cat uninstall_manifest.txt | sed --expression='s/ /\\ /g' | xargs rm --force; \
    rm --force uninstall_manifest.txt

This way, automake installs My_big_archive.tar.lz in the $(pkgdata) directory, and extracts it there, making a list of all the files that were in it, so it can uninstall them later. This also runs much faster than listing each file as an install target, even if you were to autogenerate that list.

Diacritical answered 20/6, 2011 at 13:23 Comment(3)
Why do you install this into $(pkgdatadir)? Just extract it from $(srcdir) in install-data-hook. Use something like EXTRA_DIST = giant_archive.tar.lzma so it is distributed.Drape
Three minor suggestions, unrelated to the actual question. I would replace cat xxx | yyy | zzz by yyyy <xxx | zzz to simplify the work of the OS (it can also be written <xxx yyy | zzz if you want to keep xxx at the beginning of the line). Also you can probably extract in verbose mode in order to create your manifest at the same time (avoiding the two decompressions of the big archive). Finally the use of all those long options (e.g. rm --force instead of rm -f) makes your Makefile non-portable outside of the GNU world, while it could easily be.Highlander
@adl, thanks! I always thought that using long options in scripts made the scripts easier to read, but portability is better.Diacritical
H
11

I would use a script to generate a Makefile fragment that lists all the files:

echo 'subdir_files =' > subfiles.mk
find subdir -type f -print | sed 's/^/  /;$q;s/$/ \\/' >> subfiles.mk

and then include this subfiles.mk from your main Makefile.am:

include $(srcdir)/subfiles.mk
nobase_dist_pkgdata_DATA = $(subdir_files)

A second option is to EXTRA_DIST = subdir, and then to write custom install-data-local and uninstall-local rules.

The problem here is that EXTRA_DIST = subdir will distributes all files in subdir/, including backup files, configuration files (e.g. from your VCS), and other things you would not want to distribute.

Using a script as above let you filter the files you really want to distribute.

Highlander answered 18/6, 2011 at 15:53 Comment(6)
Just a note: The first one doesn't work, as it fails with "make[2]: execvp: /bin/bash: Argument list too long"Murdoch
Wow! I am generating some Makefiles with pretty large lists of files, but had not yet realized I would hit the command line size limit some day!Highlander
Could you give an example of the second option? If I just "cp -Rf" I get wrong permissions and I need to chmod after cp (or maybe umask before cp). I looked into automatically generated rules, they use $(INSTALL_DATA) instead of cp, but in this case all files need to be specified, so this would require "find", etc. What's the best, most portable way to install whole directory in install-data-local?Ardeb
@Ardeb Something like this google.com/codesearch#EIS5BlthS9Y/…Highlander
The find command has a printf output option which might be easier to understand then piping to sed.Intimist
@Ishmael: the GNU implementation of find has a -printf option, but it this is not a standard option. If you want your Makefiles to be portable, you should avoid it.Highlander
D
10

I've found that installing hundreds of files separately makes for a tormentingly long invocation of make install. I had a similar case where I wanted to install hundreds of files, preserving the directory structure, and I did not want to change my Makefile.am every time a file was added to or removed from the collection.

I included a LZMA archive of the files in my distribution, and made automake rules like so:

GIANTARCHIVE = My_big_archive.tar.lz

dist_pkgdata_DATA = $(GIANTARCHIVE)

install-data-hook:
    cd $(DESTDIR)$(pkgdatadir); \
    cat $(GIANTARCHIVE) | unlzma | $(TAR) --list > uninstall_manifest.txt; \
    cat $(GIANTARCHIVE) | unlzma | $(TAR) --no-same-owner --extract; \
    rm --force $(GIANTARCHIVE); \
    cat uninstall_manifest.txt | sed --expression='s/^\|$$/"/g' | xargs chmod a=rX,u+w

uninstall-local:
    cd $(DESTDIR)$(pkgdatadir); \
    cat uninstall_manifest.txt | sed --expression='s/ /\\ /g' | xargs rm --force; \
    rm --force uninstall_manifest.txt

This way, automake installs My_big_archive.tar.lz in the $(pkgdata) directory, and extracts it there, making a list of all the files that were in it, so it can uninstall them later. This also runs much faster than listing each file as an install target, even if you were to autogenerate that list.

Diacritical answered 20/6, 2011 at 13:23 Comment(3)
Why do you install this into $(pkgdatadir)? Just extract it from $(srcdir) in install-data-hook. Use something like EXTRA_DIST = giant_archive.tar.lzma so it is distributed.Drape
Three minor suggestions, unrelated to the actual question. I would replace cat xxx | yyy | zzz by yyyy <xxx | zzz to simplify the work of the OS (it can also be written <xxx yyy | zzz if you want to keep xxx at the beginning of the line). Also you can probably extract in verbose mode in order to create your manifest at the same time (avoiding the two decompressions of the big archive). Finally the use of all those long options (e.g. rm --force instead of rm -f) makes your Makefile non-portable outside of the GNU world, while it could easily be.Highlander
@adl, thanks! I always thought that using long options in scripts made the scripts easier to read, but portability is better.Diacritical
D
3

I would write a script (either as a separate shell script, or in the Makefile.am), that is run as part of the install-data-hook target.

Drape answered 19/6, 2011 at 0:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.