How do I update a batch of S3 objects' metadata using ruby?
Asked Answered
S

6

13

I need to change some metadata (Content-Type) on hundreds or thousands of objects on S3. What's a good way to do this with ruby? As far as I can tell there is no way to save only metadata with fog.io, the entire object must be re-saved. Seems like using the official sdk library would require me rolling a wrapper environment just for this one task.

Scalenus answered 14/2, 2012 at 16:53 Comment(0)
H
7

You're right, the official SDK lets you modify the object metadata without uploading it again. What it does is copy the object but that's on the server so you don't need to download the file and re-upload it.

A wrapper would be easy to implement, something like

bucket.objects.each do |object|
  object.metadata['content-type'] = 'application/json'
end
Hervey answered 21/2, 2012 at 10:13 Comment(2)
More discussion of this here: groups.google.com/group/ruby-fog/browse_thread/thread/…Scalenus
this adds only metadata with x-amz-meta- prefix. is the any way of adding just a normal Content-Type metadata?Pacifa
C
5

In the v2 API, you can use Object#copy_from() or Object.copy_to() with the :metadata and :metadata_directive => 'REPLACE' options to update an object's metadata without downloading it from S3.

The code in Joost's gist throws this error:

Aws::S3::Errors::InvalidRequest: This copy request is illegal because it is trying to copy an object to itself without changing the object's metadata, storage class, website redirect location or encryption attributes.

This is because by default AWS ignores the :metadata supplied with a copy operation because it copies metadata. We must set the :metadata_directive => 'REPLACE' option if we want to update the metadata in-place.

See http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Object.html#copy_from-instance_method

Here's a full, working code snippet that I recently used to perform metadata update operations:

require 'aws-sdk'

# S3 setup boilerplate
client = Aws::S3::Client.new(
  :region => 'us-east-1',
  :access_key_id => ENV['AWS_ACCESS_KEY'],
  :secret_access_key => ENV['AWS_SECRET_KEY'], 
)
s3 = Aws::S3::Resource.new(:client => client)

# Get an object reference
object = s3.bucket('my-bucket-name').object('my-object/key')

# Create our new metadata hash. This can be any hash; in this example we update
# existing metadata with a new key-value pair.
new_metadata = object.metadata.merge('MY_NEW_KEY' => 'MY_NEW_VALUE')

# Use the copy operation to replace our metadata
object.copy_to(object,
  :metadata => new_metadata,

  # IMPORTANT: normally S3 copies the metadata along with the object.
  # we must supply this directive to replace the existing metadata with
  # the values we supply
  :metadata_directive => "REPLACE",
)

For easy re-use:

def update_metadata(s3_object, new_metadata = {})
  s3_object.copy_to(s3_object,
    :metadata => new_metadata
    :metadata_directive => "REPLACE"
  )
end
Caber answered 26/8, 2016 at 1:13 Comment(2)
To add cache control use: object.copy_to(object, cache_control: 'public,max-age=333333', metadata_directive: 'REPLACE')Prosser
The V3 API does not seem to support this anymore, I think the copy_from in v3 maybe only does metadata copy, does not allow replace? docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/…Spray
N
4

For future readers, here's a complete sample of changing stuff using the Ruby aws-sdk v1 (also see this Gist for a aws-sdk v2 sample):

# Using v1 of Ruby aws-sdk as currently v2 seems not able to do this (broken?).
require 'aws-sdk-v1'

key = YOUR_AWS_KEY
secret = YOUR_AWS_SECRET
region = YOUR_AWS_REGION

AWS.config(access_key_id: key, secret_access_key: secret, region: region)
s3 = AWS::S3.new
bucket = s3.buckets[bucket_name]
bucket.objects.with_prefix('images/').each do |obj|
  puts obj.key
  # Add  metadata: {} to next line for more metadata.
  obj.copy_from(obj.key, content_type: obj.content_type, cache_control: 'max-age=1576800000',  acl: :public_read)
end
Nikos answered 23/2, 2015 at 8:30 Comment(3)
Your Gist says the v2 sample doesn't seem to work, and suggests it might be a bug in the SDK... I take it you haven't yet resolved it?Blader
Nope. Just try the gist with the latest version of v2 :)Nikos
v2 version doesn't work for me either. I've commented the gist with my solution (re-up-loading each file).Astrix
A
3

after some search this seems to work for me

obj.copy_to(obj, :metadata_directive=>"REPLACE", :acl=>"public-read",:content_type=>"text/plain")
Alleged answered 11/6, 2017 at 10:38 Comment(0)
F
2

Using the sdk to change the content type will result in x-amz-meta- prefix. My solution was to use ruby + aws cli. This will directly write to the content-type instead of x-amz-meta-content-type.

ids_to_copy = all_object_ids
ids_to_copy.each do |id|
    object_key = "#{id}.pdf"
    command = "aws s3 cp s3://{bucket-name}/#{object_key} s3://{bucket-name}/#{object_key} --no-guess-mime-type --content-type='application/pdf' --metadata-directive='REPLACE'"
    system(command)
end
Fluorosis answered 6/7, 2020 at 23:57 Comment(0)
A
1

This API appears to be available now:

Fog::Storage.new({
  :provider                 => 'AWS',
  :aws_access_key_id        => 'foo',
  :aws_secret_access_key    => 'bar',
  :endpoint => 'https://s3.amazonaws.com/',
  :path_style => true
}).put_object_tagging(
  'bucket_name',
  's3_key',
  {foo: 'bar'}
)
Audrey answered 11/5, 2022 at 19:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.