gsutil cp: copy files with -I option to matching subdirectories
Asked Answered
D

4

8

I would like to copy a list of files to a bucket while keeping the directory-structure.

test.txt:

a/b/1.jpg
a/c/23.jpg
a/d/145.jpg

gsutil command:

cat file.txt | gsutil -m cp -I 'gs://my-bucket/'

This copies the files but ignores the subdirectories. Is there a way to solve my problem? Thanks a lot!

Dulin answered 2/12, 2015 at 15:15 Comment(0)
O
3

I came across this question because i had a very similar case. There still isn't a great way to do that, but i recently found this tip which allows to use gsutil rsync and hack -x flag to act as inclusion rather than exclusion by adding negative lookahead.

For example, below would copy all json files found in any subdirectory of current directory, while preserving their paths in a bucket

gsutil -m rsync -r -x '^(?!.*\.json$).*' . gs://mybucket

This can be further adjusted to include multiple entries. For example, this command would copy all found json, yaml and yml files

gsutil -m rsync -r -x '^(?!.*\.(json|yaml|yml)$).*' . gs://mybucket

By itself this is not very useful for a case, where you have specified file list, but let's work on it. Let's use youtube-dl repo (https://github.com/ytdl-org/youtube-dl.git) as an example.

Let's take all md files from the repo and pretend they are our specified file list. Last file is in a subpath

find * -name "*.md"
CONTRIBUTING.md
README.md
docs/supportedsites.md

We use * to remove leading dots from the names to require less processing

# Read file paths into var
# For file with path list, use
# cat file|read -d '' flist
find * -name "*.md"|read -d '' flist

# Concat paths into what gsutil accepts as a file list in -x parameter
rx="^(?\!($(echo $flist|tr '\n' '|')$)).*"

# Preview rx variable (just for clarity)
echo $rx
^(?!(CONTRIBUTING.md|README.md|docs/supportedsites.md|$)).*

# Run sync in dry mode
gsutil -m rsync -n -r -x $rx . gs://mybucket
...
Would copy file://./CONTRIBUTING.md to gs://mybucket/CONTRIBUTING.md
Would copy file://./README.md to gs://mybucket/README.md
Would copy file://./docs/supportedsites.md to gs://mybucket/docs/supportedsites.md

While a little involved, it does allow use of -m flag for speed while preserving paths.

With some more processing it should be very possible to

  • remove empty newline from find result
  • handle paths beginning with ./
Opulence answered 31/5, 2022 at 12:47 Comment(0)
B
2

I just had the same issue and after some thoughts, I figured that it is by design just like unix cp command. I came up with a solution using xargs utility and you can do:

cat test.txt | xargs -I '{}' gsutil '{}' gs://my-bucket-name/'{}'

With -I option, xargs executes the following command for each input line. One downside this method introduces is that you can't use -m for gsutil cp which can significantly slows down the task.

Bloodmobile answered 18/6, 2017 at 15:42 Comment(0)
S
0

I believe you have to use the -R option to get recursive copy: gsutil -m cp -R gs://my-bucket/

From the documetation: https://cloud.google.com/storage/docs/gsutil/commands/cp

If you want to copy an entire directory tree you need to use the -r option: gsutil cp -r dir gs://my-bucket

Hope that helps!

Syconium answered 1/10, 2016 at 12:14 Comment(1)
alas, what -r does is copy the tree of a specified directory to a target, not any intervening directories for a specified file. Effectively, gsutil cp has no idea about a "root"/"working" directory for the source.Pris
B
-1

I think you can use find command instead. For example, the following command helped me to copy all .json file under "[you path]" folder and its subfolders into a GCS bucket, with the -m option working. Hope it helps

sudo find [your path] -print | grep -i '.*[.]json' | sudo gsutil -m cp -I gs://[your bucket]
Bereave answered 13/7, 2017 at 17:49 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.