How to sync new ActiveStorage mirrors?
Asked Answered
U

4

8

Starting with ActiveStorage you can know define mirrors for storing your files.

local:
  service: Disk
  root: <%= Rails.root.join("storage") %>

amazon:
  service: S3
  access_key_id: <%= Rails.application.credentials.dig(:aws, :access_key_id) %>
  secret_access_key: <%= Rails.application.credentials.dig(:aws, :secret_access_key) %>
  region: us-east-1
  bucket: mybucket

mirror:
  service: Mirror
  primary: local
  mirrors:
    - amazon
    - another_mirror

If you add a mirror after a certain point of time you have to take care about copying all files e.g. from "local" to "amazon" or "another_mirror".

  1. Is there a convenient method to keep the files in sync?
  2. Or method run a validation to check if all files are avaiable on each service?
Undertook answered 7/10, 2018 at 0:23 Comment(0)
W
19

I have a couple of solutions that might work for you, one for Rails <= 6.0 and one for Rails >= 6.1:

Firstly, you need to iterate through your ActiveStorage blobs:

ActiveStorage::Blob.all.each do |blob|
  # work with blob
end

then...

  1. Rails <= 6.0

    You will need the blob's key, checksum, and the local file on disk.

    local_file = ActiveStorage::Blob.service.primary.path_for blob.key
    
    # I'm picking the first mirror as an example,
    # but you can select a specific mirror if you want
    mirror = blob.service.mirrors.first
    
    mirror.upload blob.key, File.open(local_file), checksum: blob.checksum
    

    You may also want to avoid uploading a file if it already exists on the mirror. You can do that by doing this:

    mirror = blob.service.mirrors.first
    
    # If the file doesn't exist on the mirror, upload it
    unless mirror.exist? blob.key
      # Upload file to mirror
    end
    

    Putting it together, a rake task might look like:

    # lib/tasks/active_storage.rake
    
    namespace :active_storage do
    
      desc 'Ensures all files are mirrored'
      task mirror_all: [:environment] do
    
      # Iterate through each blob
      ActiveStorage::Blob.all.each do |blob|
    
        # We assume the primary storage is local
        local_file = ActiveStorage::Blob.service.primary.path_for blob.key
    
        # Iterate through each mirror
        blob.service.mirrors.each do |mirror|
    
          # If the file doesn't exist on the mirror, upload it
          mirror.upload(blob.key, File.open(local_file), checksum: blob.checksum) unless mirror.exist? blob.key
    
          end
        end
      end
    end
    

    You may run into a situation like @Rystraum mentioned where you might need to mirror from somewhere other than the local disk. In this case, the rake task could look like this:

    # lib/tasks/active_storage.rake
    
    namespace :active_storage do
    
      desc 'Ensures all files are mirrored'
      task mirror_all: [:environment] do
    
        # All services in our rails configuration
        all_services = [ActiveStorage::Blob.service.primary, *ActiveStorage::Blob.service.mirrors]
    
        # Iterate through each blob
        ActiveStorage::Blob.all.each do |blob|
    
          # Select services where file exists
          services = all_services.select { |file| file.exist? blob.key }
    
          # Skip blob if file doesn't exist anywhere
          next unless services.present?
    
          # Select services where file doesn't exist
          mirrors = all_services - services
    
          # Open the local file (if one exists)
          local_file = File.open(services.find{ |service| service.is_a? ActiveStorage::Service::DiskService }.path_for blob.key) if services.select{ |service| service.is_a? ActiveStorage::Service::DiskService }.any?
    
          # Upload local file to mirrors (if one exists)
          mirrors.each do |mirror|
            mirror.upload blob.key, local_file, checksum: blob.checksum
          end if local_file.present?
    
          # If no local file exists then download a remote file and upload it to the mirrors (thanks @Rystraum)
          services.first.open blob.key, checksum: blob.checksum do |temp_file|
            mirrors.each do |mirror|
              mirror.upload blob.key, temp_file, checksum: blob.checksum
            end
          end unless local_file.present?
    
        end
      end
    end
    

    While the first rake task answers the OP's question, the latter is much more versatile:

    • It can be used with any combination of services
    • A DiskService is not required
    • Uploading via DiskServices are prioritized
    • Avoids extra exists? calls as we only call it once per service per blob
  2. Rails > 6.1

    Its super easy, just call this on each blob...

    blob.mirror_later
    

    Wrapping it up as a rake task looks like:

    # lib/tasks/active_storage.rake
    
    namespace :active_storage do
    
      desc 'Ensures all files are mirrored'
      task mirror_all: [:environment] do
        ActiveStorage::Blob.all.each do |blob|
          blob.mirror_later
        end
      end
    end
    
Whiz answered 20/8, 2019 at 18:45 Comment(5)
Thanks, worked like a charm! Just don't forget to put config.active_storage.service = :mirror in development.rb or whatever env you wantWedded
Thank you for the solution, I just want to elaborate on it as it was not clear to me if the 6.1 solution (the point 2) actually copy the file to mirror: yes it does. It does it by (eventually) calling this class github.com/rails/rails/blob/… which will eventually call github.com/rails/rails/blob/…Armistead
Unfortunately this (6.1) does not work for me. Nothing happens, not even an error. :/Erotogenic
I also get no response because my service doesn't respond to :mirror - so rails skips the mirroring. apidock.com/rails/v6.1.3.1/ActiveStorage/Blob/mirror_later I can fix this by enqueuing the job directly ActiveStorage::MirrorJob.perform_later(blob.key,checksum:blob.checksum)Avow
mirror_later only works for Blobs which have service_name set to mirror (or whatever you called your mirror service in storage.yml). So, if all of your Blobs actually are stored on the primary storage of your mirror, you could update those Blobs to service_name = mirror and then call mirror_later on all of them.Underfoot
J
7

(03-11-2021) On Rails > 6.1.4.1, using active_storage > 6.1.4.1 and within:

Gemfile:

gem 'azure-storage-blob', github: 'Azure/azure-storage-ruby'

config/environments/production.rb

 # Store uploaded files on the local file system (see config/storage.yml for options).
  config.active_storage.service = :mirror #:microsoft or #:amazon

config/storage.yml:

amazon:
  service: S3
  access_key_id: XXX
  secret_access_key: XXX
  region: XXX
  bucket: XXX

microsoft:
  service: AzureStorage
  storage_account_name: YYY
  storage_access_key: YYY
  container: YYY

mirror:
  service: Mirror
  primary: amazon
  mirrors: [ microsoft ]

This does NOT work:

ActiveStorage::Blob.find_each do |blob|
  blob.mirror_later
end && puts("Mirroring done!")

What DID work is:

ActiveStorage::Blob.find_each do |blob|
  ActiveStorage::Blob.service.try(:mirror, blob.key, checksum: blob.checksum)
end && puts("Mirroring done!")

Not sure why that is, maybe future versions of Rails support it, or it needs additional background job setup, or it would have happened eventually (which never happened for me).

TL;DR

If you need to do mirroring for your entire storage immediately, add this rake task and execute it on your given environment with bundle exec rails active_storage:mirror_all:

lib/tasks/active_storage.rake

namespace :active_storage do
  desc 'Ensures all files are mirrored'
  task mirror_all: [:environment] do
    ActiveStorage::Blob.find_each do |blob|
      ActiveStorage::Blob.service.try(:mirror, blob.key, checksum: blob.checksum)
    end && puts("Mirroring done!")
  end
end

Optional:
Once you mirrored all the blobs, then you probably want to change all their service names if you want them to actually get served from the right storage:

namespace :active_storage do
  desc 'Change each blob service name to microsoft'
    task switch_to_microsoft: [:environment] do
      ActiveStorage::Blob.find_each do |blob|
        blob.service_name = 'microsoft'
        blob.save
    end && puts("All blobs will now be served from microsoft!")
  end
end

Finally, change: config.active_storage.service= in production.rb or make the primary mirror to be the one you want future uploads to go to.

Jambalaya answered 3/11, 2021 at 15:3 Comment(1)
I would just replace ActiveStorage::Blob.all.each do |blob| for ActiveStorage::Blob.find_each do |blob| if you have thousands or millions of blobs.Kusin
M
2

I've worked on top of https://mcmap.net/q/1242497/-how-to-sync-new-activestorage-mirrors so the rake task does not assume that the file is in local.

I started with S3, and due to cost concerns, I've decided to move the files to disk and use S3 and Azure as mirrors instead.

So my situation is that for some files, my primary (disk) sometimes don't have the file and my complete repository is actually on my 1st mirror.

So, it's 2 things:

  1. Move files from S3 to disk
  2. Added a new mirror, and want to keep it up to date
namespace :active_storage do
  desc "Ensures all files are mirrored"
  task mirror_all: [:environment] do
    ActiveStorage::Blob.all.each do |blob|
      source_mirror = if blob.service.primary.exist? blob.key
                        blob.service.primary
                      else
                        blob.service.mirrors.find { |m| m.exist? blob.key }
                      end

      source_mirror.open(blob.key, checksum: blob.checksum) do |file|
        blob.service.primary.upload(blob.key, file, checksum: blob.checksum) unless blob.service.primary.exist? blob.key

        blob.service.mirrors.each do |mirror|
          next if mirror == source_mirror

          mirror.upload(blob.key, file, checksum: blob.checksum) unless mirror.exist? blob.key
        end
      end
    rescue StandardError
      puts blob.key.to_s
    end
  end
end
Marion answered 16/3, 2020 at 2:1 Comment(1)
Nice, thanks Rystraum! I recently ran into a similar situation where I transferred an app to a new server and needed to sync files between the DiskService and Mirrors. I updated my answer based on some of your code. Much appreciated!Whiz
A
1

Everything is stored according to ActiveStorage's keys, so as long as your bucket names and file names aren't changed in the transfer, you can just copy everything over to the new service. See this post for how to copy stuff over.

Appellant answered 9/10, 2018 at 22:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.