AWS CLI S3 sync only over selected files?
Asked Answered
M

2

10

I need synchronize two AWS S3 buckets, but I need sync only the files in a list. This is the scenario:

BucketA:

File1.jpg Deleted  
File2.jpg Modified
File3.jpg Deleted
File4.jpg Modified
File5.jpg Modified
File6.jpg New

BucketB:

File1.jpg 
File2.jpg 
File3.jpg 
File4.jpg 
File5.jpg 

I'm looking for a command like this:

aws s3 sync s3://BucketA s3://BucketB --delete --exclude "*" --include "File1.jpg;File2.jpg;File4.jpg"

The result BucketB must be like this:

File1.jpg deleted
File2.jpg Modified
File3.jpg No changed
File4.jpg Modified
File5.jpg No changed

Any idea?

Matchmark answered 21/7, 2015 at 10:22 Comment(0)
C
12

It looks like this is achievable, except for the deletion part.

This command will sync only the specified files:

aws s3 sync s3://bucketA s3://bucketB --exclude "*" --include "File1.jpg" --include "File2.jpg" --include "File4.jpg"

However, the --delete parameter seems to only look at the files in BucketA that are included in the --include parameter, causing all other files to 'invisible' and therefore deleted from BucketB.

This command:

aws s3 sync s3://bucketA s3://bucketB --delete --exclude "*" --include "File1.jpg" --include "File2.jpg" --include "File4.jpg"

actually deletes all files except File2.jpg and File4.jpg. So, it doesn't look like you can do a selective delete in the expected manner.

Here's a script to test all of the above:

aws s3 cp foo s3://bucketa/File1.jpg
aws s3 cp foo s3://bucketa/File2.jpg
aws s3 cp foo s3://bucketa/File3.jpg
aws s3 cp foo s3://bucketa/File4.jpg
aws s3 cp foo s3://bucketa/File5.jpg
aws s3 sync s3://bucketa s3://bucketb
aws s3 rm s3://bucketa/File1.jpg
aws s3 rm s3://bucketa/File3.jpg
aws s3 cp foo s3://bucketa/File6.jpg
aws s3 cp bar s3://bucketa/File2.jpg
aws s3 cp bar s3://bucketa/File4.jpg
aws s3 cp bar s3://bucketa/File5.jpg

aws s3 ls s3://bucketa
2015-07-23 08:50:44         49 File2.jpg
2015-07-23 08:50:49         49 File4.jpg
2015-07-23 08:50:53         49 File5.jpg
2015-07-23 08:50:20         24 File6.jpg

aws s3 ls s3://bucketb
2015-07-23 08:49:35         24 File1.jpg
2015-07-23 08:49:35         24 File2.jpg
2015-07-23 08:49:36         24 File3.jpg
2015-07-23 08:49:36         24 File4.jpg
2015-07-23 08:49:36         24 File5.jpg 

aws s3 sync s3://bucketa s3://bucketb --exclude "*" --include "File1.jpg" --include "File2.jpg" --include "File4.jpg"
Columnist answered 23/7, 2015 at 9:43 Comment(6)
Thank you very much John. I will try to do the delete in a separated command.Matchmark
May be AWS architechs should take in consideration improve the --include parameter to do this type of process in a more compact form. I think it is not dificult to apply the sync process only to the files in a list and not process files that are not in this list.Matchmark
I have tested this solution in a real case and it is inviable. An AWS CLI command with 50 --include freezes the console.Matchmark
Do we need the exclude as if we have include specific files already? --exclude "*" --include "File1.jpg" --include "File2.jpg" --include "File4.jpg"Boarfish
@AlwaysSunny Give it a try and let us know!Columnist
@JohnRotenstein sir, actually I am already using like this --include "*" --exclude "*.html" --acl public-read that's why curious why we need exclude if we have the include flag, I thought include means only include thoseBoarfish
E
1

There is no way to sync specific files, but acutally a few bad workarounds.

  1. As @John Rotenstein mentioned, you could use --exclude="*" --indclude="FILEPATH"

    • this solution will work for one or two files
    • if you wanna sync more files it will last longer than simply syncing all files
    • the reason why it will take longer is, that the 'sync' command will recursivly iterate over all files of your target directory
    • for each argument passed with an --include, there will be one iteration overall files
    • sync will check whether the pattern which is passed via --include, is matching a file path
    • you could also pass wildcards instead of paths via --include
    • by using --debug option you could verify this yourself
  2. You could use 'cp' command instead of 'sync' and append your file path to the path of your target directory

    • this method does not check whether a sync is needed, it will just copy the file
    • for each file path the whole copy command is executed, which is very time consuming

So for at least a few files you could use these workarounds, but if there are several hundred files as in my case, there is no way.

If you want to push this, I've already opened a ticket at github: https://github.com/aws/aws-cli/issues/5167

Emulation answered 7/5, 2020 at 17:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.