What is the best way to backup Azure Blob Storage contents
Asked Answered
B

6

46

I know that the Azure Storage entities (blobs, tables and queues) have a built-in resiliency, meaning that they are replicated to 3 different servers in the same datacenter. On top of that they may also be replicated to a different datacenter altogether that is physically located in a different geographical region. The chance of losing your data in this case is close to zero for all practical purposes.

However, what happens if a sloppy developer (or the one under the influence of alcohol :)) accidentally deletes the storage account through the Azure Portal or the Azure Storage Explorer tool? Worst yet, what if a hacker gets hold of your account and clears the storage? Is there a way to retrieve the gigabytes of deleted blobs or is that it? Somehow I think there has to be an elegant solution that Azure infrastructure provides here but I cannot find any documentation.

The only solution I can think of is to write my own process (worker role) that periodically backs up my entire storage to a different subscription/account, thus essentially doubling the cost of storage and transactions. Any thoughts?

Regards,

Archil

Bridging answered 19/7, 2012 at 13:24 Comment(0)
M
25

Depending on where you want to backup your data, there are two options available:

  1. Backing up data locally - If you wish to backup your data locally in your infrastructure, you could: a. Write your own application using either Storage Client Library or consuming REST API or b. Use 3rd party tools like Cerebrata Azure Management Cmdlets (Disclosure: I work for Cerebrata).

  2. Backing up data in the cloud - Recently, Windows Azure Storage team announced Asynchronous Copy Blob functionality which will essentially allow you to copy data from one storage account to another storage account without downloading the data locally. The catch here is that your target storage account should be created after 7th June 2012. You can read more about this functionality on Windows Azure Blog: http://blogs.msdn.com/b/windowsazurestorage/archive/2012/06/12/introducing-asynchronous-cross-account-copy-blob.aspx.

Hope this helps.

Misadventure answered 19/7, 2012 at 15:58 Comment(4)
I've faced this exact problem and backed up with the .net storage client. If I was writing it again today I'd use the Asynchonous Copy Blob, much faster.Rickard
Cerebrata Azure Management Cmdlets look to be discontinuedBlub
Gaurav Mantri Link from the first option doesn't work anymore.Annalee
Is there a way to download the Azure blob storage backup in local and then export it back to Azure later? I know we can export from one storage account to another but I want to download the backup copy to local, similar to how we do for SQL database.Flea
V
4

The accepted answer is fine, but it took me a few hours to decipher through everything.

I've put together solution which I use now in production. I expose method Backup() through Web Api which is then called by an Azure WebJob every day (at midnight).

Note that I've taken the original source code, and modified it:

  • it wasn't up to date so I changed few method names
  • added retry copy operation safeguard (fails after 4 tries for the same blob)
  • added a little bit of logging - you should swap it out with your own.
  • does the backup between two storage accounts (replicating containers & blobs)
  • added purging - it gets rid of old containers that are not needed (keeps 16 days worth of data). you can always disable this, as space is cheap.

the source can be found from: https://github.com/ChrisEelmaa/StackOverflow/blob/master/AzureStorageAccountBackup.cs

and this is how I use it in the controller (note your controller should be only callable by the azure webjob - you can check credentials in the headers):

[Route("backup")]
[HttpPost]
public async Task<IHttpActionResult> Backup()
{
    try
    {
        await _blobService.Backup();
        return Ok();
    }
    catch (Exception e)
    {
        _loggerService.Error("Failed to backup blobs " + e);
        return InternalServerError(new Exception("Failed to back up blobs!"));
    }
}

note: I wanted to add this code as part of the post, but wasted 6 minutes trying to get that code into this post, but failed. the formatting didn't work at all, and it broke completely.

Vieva answered 27/12, 2016 at 18:40 Comment(2)
The metadata key names cannot contain "-" anymore. If you rename it to "CreateAt" and "BackupOf" everything works fine.Snowdrift
are you using transactions when backuping a container?Ketron
S
4

I have used Azure Data Factory to backup Azure storage with great effect. It's really easy to use, cost effective and work very well.

Simply create a Data Factory (v2), set up data connections to your data sources (it currently supports Azure Tables, Azure Blobs and Azure Files) and then set up a data copy pipeline.

The pipelines can merge, overwrite, etc. and you can set up custom rules/wildcards.

Once you've set up the pipeline, you should then set up a schedule trigger. This will kick off the backup at an interval to suit your needs.

I've been using it for months and it's perfect. No code, no VMS, no custom PowerShell scripts or third party software. Pure Azure solution.

Setiform answered 18/3, 2019 at 3:1 Comment(0)
C
2

I have had exactly the same requirements: backing up blobs from Azure as we have millions of blobs of customers and you are right - a sloppy developer with full access can compromise the entire system.

Thus, I wrote an entire application "Blob To Local Backup", free and open source on github under the MIT license: https://github.com/smartinmedia/BlobToLocalBackup

It solves many of your issues, namely: a) you can give only READ-access to this application, so that the application cannot destroy any data on Azure b) backup to a server, where your sloppy developer or the hacker does not have the same access as to your Azure account. c) The software provides versioning, so you can even protect yourself from e. g. ransom/encryption attacks. d) I included a serialization method instead of a database, so you can even have millions of files on Azure and you are still able to keep the synch (we have 20 million files on Azure).

Here is how it works (for more detailed information, read the README on github):

  1. You set up the appsettings.json file in the main folder. You can give the LoginCredentials here for the entire access or do it more granular on a storage account level:
    {
     "App": {

        "ConsoleWidth": 150,
        "ConsoleHeight":  42,

        "LoginCredentials": {
            "ClientId": "2ab11a63-2e93-2ea3-abba-aa33714a36aa",
            "ClientSecret": "ABCe3dabb7247aDUALIPAa-anc.aacx.4",
            "TenantId": "d666aacc-1234-1234-aaaa-1234abcdef38"
        },
        "DataBase": {
          "PathToDatabases": "D:/temp/azurebackup"
        },
        "General": {
          "PathToLogFiles": "D:/temp/azurebackup"
        }
      }
    }

  1. Set up a job as a JSON file like this (I have added numerous options):
    {
      "Job": {
        "Name": "Job1",
        "DestinationFolder": "D:/temp/azurebackup",
        "ResumeOnRestartedJob": true,
        "NumberOfRetries": 0, 
        "NumberCopyThreads": 1,
        "KeepNumberVersions": 5,
        "DaysToKeepVersion": 0, 
        "FilenameContains": "", 
        "FilenameWithout": "", 
        "ReplaceInvalidTargetFilenameChars": false,
        "TotalDownloadSpeedMbPerSecond": 0.5,

        "StorageAccounts": [
          {

            "Name": "abc",
            "SasConnectionString": "BlobEndpoint=https://abc.blob.core.windows.net/;QueueEndpoint=https://abc.queue.core.windows.net/;FileEndpoint=https://abc.file.core.windows.net/;TableEndpoint=https://abc.table.core.windows.net/;SharedAccessSignature=sv=2019-12-12&ss=bfqt&srt=sco&sp=rl&se=2020-12-20T04:37:08Z&st=2020-12-19T20:37:08Z&spr=https&sig=abce3e399jdkjs30fjsdlkD",
            "FilenameContains": "",
            "FilenameWithout": "",
            "Containers": [
              {
                "Name": "test",
                "FilenameContains": "",
                "FilenameWithout": "",
                "Blobs": [
                  {
                    "Filename": "2007 EasyRadiology.pdf",
                    "TargetFilename": "projects/radiology/Brochure3.pdf"
                  }
                ]
              },
              {
                "Name": "test2"
              }
            ]

          },
          {
            "Name": "martintest3",
            "SasConnectionString": "",
            "Containers": [] 
          }
        ]
      }
      
    }
  1. Run the application with your job with:
    blobtolocal job1.json
Crossman answered 19/12, 2020 at 23:33 Comment(0)
E
1

Without referring to 3rd party solutions, you can achieve that using built in features in Azure now using the below steps might help secure your blob.

  1. Soft delete for Azure Storage Blobs The better step is first to enable soft delete which is now in GA: https://azure.microsoft.com/en-us/blog/soft-delete-for-azure-storage-blobs-ga

  2. Read-access geo-redundant storage The second approach is enable geo-replication for RA-RGA, so if the first data center is down you can always read from a secondary replica in another region, you can find more information in here: https://learn.microsoft.com/en-us/azure/storage/common/storage-redundancy-grs

Evensong answered 12/11, 2018 at 10:26 Comment(0)
B
0

You can make a snapshot of a blog container and then download the snapshot for a point in time backup.

https://learn.microsoft.com/en-us/azure/storage/storage-blob-snapshots

A snapshot is a read-only version of a blob that's taken at a point in time. Snapshots are useful for backing up blobs. After you create a snapshot, you can read, copy, or delete it, but you cannot modify it.+ A snapshot of a blob is identical to its base blob, except that the blob URI has a DateTime value appended to the blob URI to indicate the time at which the snapshot was taken. For example, if a page blob URI is http://storagesample.core.blob.windows.net/mydrives/myvhd, the snapshot URI is similar to http://storagesample.core.blob.windows.net/mydrives/myvhd?snapshot=2011-03-09T01:42:34.9360000Z.

Blub answered 15/8, 2017 at 13:41 Comment(2)
To be clear, this is a snapshot of a Blob and not the entire Blob Container. So you can't exactly "make a snapshot of a blog container and then download the snapshot" - If I am wrong, please correct me.Rashida
You are right, this solution is not very feasible for backing up storage containers.Sanguinary

© 2022 - 2024 — McMap. All rights reserved.