Using Rsync include and exclude options to include directory and file by pattern
Asked Answered
P

3

103

I'm having problems getting my rsync syntax right and I'm wondering if my scenario can actually be handled with rsync. First, I've confirmed that rsync is working just fine between my local host and my remote host. Doing a straight sync on a directory is successful.

Here's what my filesystem looks like:

uploads/
  1260000000/
    file_11_00.jpg
    file_11_01.jpg
    file_12_00.jpg
  1270000000/
    file_11_00.jpg
    file_11_01.jpg
    file_12_00.jpg
  1280000000/
    file_11_00.jpg
    file_11_01.jpg
    file_12_00.jpg

What I want to do is run rsync only on files that begin with "file_11_" in the subdirectories and I want to be able to run just one rsync job to sync all of these files in the subdirectories.

Here's the command that I'm trying:

rsync -nrv --include="**/file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/

This results in 0 files being marked for transfer in my dry run. I've tried various other combinations of --include and --exclude statements, but either continued to get no results or got everything as if no include or exclude options were set.

Anyone have any idea how to do this?

Pica answered 31/3, 2012 at 0:40 Comment(0)
K
155

The problem is that --exclude="*" says to exclude (for example) the 1260000000/ directory, so rsync never examines the contents of that directory, so never notices that the directory contains files that would have been matched by your --include.

I think the closest thing to what you want is this:

rsync -nrv --include="*/" --include="file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/

(which will include all directories, and all files matching file_11*.jpg, but no other files), or maybe this:

rsync -nrv --include="/[0-9][0-9][0-9]0000000/" --include="file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/

(same concept, but much pickier about the directories it will include).

In either case, note that the --include=... option needs to come before the --exclude=... option, because we need the former to take precedence over the latter when a file matches both patterns.

Kagera answered 31/3, 2012 at 1:2 Comment(7)
Thanks! That was exactly what I needed. My scenario was actually more or less what you described in your second example, but I kept my question simplified to make the question more straight forward.Pica
Note the importance of (e.g.) --include="*/" in including the parent directories of the files you actually want to include.Thedrick
Note the order of arguments: --include has to come before --excludeTetany
@Tetany Yeah, I feel that this should be mentioned in the actual answer. I was trying to do something similar before finding this page, and knew that I needed the -include="*/", yet it still didn't work. Looking at this answer my first thought was "that's exactly what I'm doing!". Then I noticed the order was different.Pleione
Another key concept is that "when using the --recursive (-r) option (which is implied by -a), every subcomponent of every path is visited from the top down, so include/exclude patterns get applied recursively to each subcomponent's full name"Disembody
For some odd reason this doesn't work in my case. I only want to copy some directories and all files in them with the subdirectories. I have the directory list in a file and use --include-from=name-fo-file. If I include the --include="*/" in the biginning, it copies everything. If I don't, it copies only the empty "included" directories. I'm puzzled.Bracci
rsync's implementation of include and exclude is so incredibly unintuitive. Why should ordering of the command switches matter? Why wouldn't every file be considered? If include is specified alone, why wouldn't it assume that everything else is excluded? It's not even remotely close to how I would have approached this problem.Kidderminster
F
83

rsync include exclude pattern examples:

"*"         means everything
"dir1"      transfers empty directory [dir1]
"dir*"      transfers empty directories like: "dir1", "dir2", "dir3", etc...
"file*"     transfers files whose names start with [file]
"dir**"     transfers every path that starts with [dir] like "dir1/file.txt", "dir2/bar/ffaa.html", etc...
"dir***"    same as above
"dir1/*"    does nothing
"dir1/**"   does nothing
"dir1/***"  transfers [dir1] directory and all its contents like "dir1/file.txt", "dir1/fooo.sh", "dir1/fold/baar.py", etc...

And final note is that simply dont rely on asterisks that are used in the beginning for evaluating paths; like "**dir" (its ok to use them for single folders or files but not paths) and note that more than two asterisks dont work for file names.

Fortuneteller answered 28/12, 2017 at 16:3 Comment(2)
Your answer is the only one which is usable because you explain the general behaving. The other answers are too specific according the OP but each situation needs another solution! It helped me a lot!Aphorize
A thorough yet not as intuitive explanation can be found on the rsync manpage in section INCLUDE/EXCLUDE PATTERN RULESHeroism
D
29

Here's my "teach a person to fish" answer:

Rsync's syntax is definitely non-intuitive, but it is worth understanding.

  1. First, use -vvv to see the debug info for rsync.
$ rsync -nr -vvv --include="**/file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/

[sender] hiding directory 1280000000 because of pattern *
[sender] hiding directory 1260000000 because of pattern *
[sender] hiding directory 1270000000 because of pattern *

The key concept here is that rsync applies the include/exclude patterns for each directory recursively. As soon as the first include/exclude is matched, the processing stops.

The first directory it evaluates is /Storage/uploads. Storage/uploads has 1280000000/, 1260000000/, 1270000000/ dirs/files. None of them match file_11*.jpg to include. All of them match * to exclude. So they are excluded, and rsync ends.

  1. The solution is to include all dirs (*/) first. Then the first dir component will be 1260000000/, 1270000000/, 1280000000/ since they match */. The next dir component will be 1260000000/. In 1260000000/, file_11_00.jpg matches --include="file_11*.jpg", so it is included. And so forth.
$ rsync -nrv --include='*/' --include="file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/

./
1260000000/
1260000000/file_11_00.jpg
1260000000/file_11_01.jpg
1270000000/
1270000000/file_11_00.jpg
1270000000/file_11_01.jpg
1280000000/
1280000000/file_11_00.jpg
1280000000/file_11_01.jpg

https://download.samba.org/pub/rsync/rsync.1

Disembody answered 24/11, 2020 at 8:18 Comment(1)
if you're always typing --include='*/', that probably indicates room for improvement in the interface.Kidderminster

© 2022 - 2024 — McMap. All rights reserved.