Copying files from AWS S3 to Glacier Flexible Retrieval in another S3
Asked Answered
C

0

0

I'm looking to change some Python 3 code which currently copies an SSE-C encoded file from one S3 bucket to another. On the target S3 bucket, the file will be stored in Glacier Flexible Retrieval format and that's currently achieved by lifecycle transition which is set on the target bucket. However, this causes a number of problems as it is asynchronous.

Ideally, I would like to copy the file so that the target is directly written to GFR storage class. However, when I looked at the documentation for boto3 s3 client copy I saw that the allowed arguments for ExtraArgs as documented here are only:

ALLOWED_DOWNLOAD_ARGS = ['ChecksumMode', 'VersionId', 'SSECustomerAlgorithm', 'SSECustomerKey', 'SSECustomerKeyMD5', 'RequestPayer', 'ExpectedBucketOwner']

So StorageClass is not in there which implies that it's not possible. I looked at copy_object as well, but it won't support files greater than 5GB which this system occasionally has to deal with.

However, I then saw this answer to a related question which seems to imply that it can be done referring to a seemingly contradictory AWS link which suggests:

You can also change the storage class of an object that is already stored in Amazon S3 to any other storage class by making a copy of the object by using the PUT Object - Copy API operation. However, you can't use PUT Object - Copy to copy objects that are stored in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes. You also can't transition from S3 One Zone-IA to S3 Glacier Instant Retrieval.

You copy the object in the same bucket by using the same key name and specifying the request headers as follows:

Set the x-amz-metadata-directive header to COPY.

Set the x-amz-storage-class header to the storage class that you want to use.

The code given (with acknowledgement to the author Frederic Henri) is as follows:

import boto3

s3 = boto3.client('s3')

copy_source = {
    'Bucket': 'mybucket',
    'Key': 'mykey'
}

s3.copy(
  copy_source, 'mybucket', 'mykey',
  ExtraArgs = {
    'StorageClass': 'STANDARD_IA',
    'MetadataDirective': 'COPY'
  }
)

However, judging by the first reference above, neither key on ExtraArgs would be valid for the S3 Client copy operation, so I'm confused whether this can be relied upon. Also, it seems to change the file in-situ rather than copying it.

I would like ideally to alter my code to:

    extra_args = {
        'CopySourceSSECustomerAlgorithm': <algorithm-string>,
        'CopySourceSSECustomerKey': <plaintext-key>,
        'SSECustomerAlgorithm': <algorithm-string>,
        'SSECustomerKey': <plaintext-key>,
        'StorageClass': 'GLACIER',  # Adding these two directives to action an immediate
        'MetadataDirective': 'COPY' # transition to Glacier Flexible Retrieval in the Target
    }
...
    response = client.copy(source, target_bucket, target_key, ExtraArgs=extra_args)

I am planning to try this out but would be grateful if anyone can confirm if it's supported or if I've misunderstood how this work.

Charmian answered 13/4, 2024 at 11:45 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.