`aws s3 cp` vs `aws s3 sync` behavior and cost [closed]
Asked Answered
M

1

41

I have a static site that I am deploying to s3 and then using CloudFront to distribute to users. After I build the site, I want to push the new build to s3. I found 2 approaches to do that.

  • aws s3 cp --recursive ./public/ s3://bucket-name --cache-control 'public, max-age=300, s-maxage=31536000'

  • aws s3 sync --delete ./public/ s3://bucket-name --cache-control 'public, max-age=300, s-maxage=31536000'

I am planning to deploy once or twice every week.

I want to know which of these is less expensive (money)? To be more clear, I want to know which among these will cost me less in the long run?

I tried reading the docs, but I was not able to figure out the differences. Please help me with this.

Mephistopheles answered 7/11, 2020 at 13:18 Comment(2)
@JD why are using cache params in the s3 ?.. cant the cache params set in clouldfront where its supposed to be ?Pneumonoultramicroscopicsilicovolcanoconiosis
The benefit is that if the bucket is being consumed by multiple cloudfront distributions or servers, they all can leverage the caching headers from the source itself instead of managing it everywhere.Sylvanus
A
82

One thing to note is that aws s3 cp --recursive and aws s3 sync --delete have different behaviors.

aws s3 cp will copy all files, even if they already exist in the destination area. It also will not delete files from your destination if they are deleted from the source.

aws s3 sync looks at the destination before copying files over and only copies over files that are new and updated. The --delete flag also will delete things at the destination if they were removed in source.

The sync command is what you want as it is designed to handle keeping two folders in sync while copying the minimum amount of data. Sync should result in less data being pushed into S3 bucket so that should have a less cost overall.

To give a counterexample, a use case where aws s3 cp outperforms and is lower cost than sync is if you just need to transfer files and you know all the files are new to the destination. This is more performant and lower cost because the code is not checking the destination if things exist before starting the transfer.

Anhwei answered 7/11, 2020 at 13:33 Comment(8)
This makes sense. But then when would one choose cp over sync ever?Mephistopheles
cp is a simpler operation. You would use it when you just want to copy a few files/folders and aren’t trying to replicate a structure from one location to anotherAnhwei
cp will also be more performant since it doesn't do any checking on the destination, it should be used if you know that files are new and don't need to compare source/destinationAnhwei
Please be aware that the comparison method that aws s3 sync uses is very simple: it only checks for file size changes, not the contents of the files. This makes the sync option very unreliable in my view and I try to avoid using it. But maybe it was my setup (syncing from a server to an S3 bucket) that was triggering this simple comparison method. Would be nice to read about the experience of others.Alfonzoalford
From the docs it seems the default behaviour is to transfer files where the size is different OR the source timestamp is newer than the destination. You can optionally make it also transfer files where the source timestamp is older. But it doesn't compare file contents.Zurek
@JDD would you be willing to update your answer to describe how aws s3 sync determines which files are "new and updated?" I believe it's by 'modified date' on the files or the size of the file. I forget the exact nuance, but I tripped over it many times.... Ah, here it is: https://mcmap.net/q/218566/-how-does-aws-s3-sync-determine-if-a-file-has-been-updated (size change, modified date newer, or doesn't exist in source; and that can be adjusted with --size-only option).Loidaloin
But what about the speed factor even we are copying to an empty location. The aws s3 sync command uses multiple threads to perform the upload, which can improve the speed of the upload by uploading multiple files in parallel. Does it make sync the first choice to copy larger folders?Catania
Just adding here that docs.aws.amazon.com/datasync/latest/userguide/… might be what you are looking for.Martins

© 2022 - 2025 — McMap. All rights reserved.