AWS - EBS Snapshots - Incremental Backup or Actually Full Backup
Asked Answered
V

3

6

I know that in AWS, EBS "snapshots are incremental backups, which means that only the blocks on the device that have changed after your most recent snapshot are saved."

But, when an EBS snapshot is used to restore data, how is all data from that EBS snapshot restored as well as the data from the previous snapshots?

For example, say, I have an empty volume. So, I add 10 GB of data to it and take a snapshot(Snapshot 1). Then, I add another 5 GB of data and take a second snapshot(Snapshot 2).

If snapshots were purely incremental backups, then when I use Snapshot 2 to restore data, I should have only 5 GB of data. But when I test it, I get 15 GB of data.

I know incremental snapshots minimize the time required to create the snapshot and save on storage costs by not duplicating data but how is it possible to restore the entire data with incremental backups?

Versieversification answered 23/5, 2020 at 3:28 Comment(0)
I
16

It's complicated!

There are two elements of a snapshot:

  • The data (stored as blocks)
  • An index to the data

Let's say you have a totally empty Amazon EBS volume. It is smart enough to know that no blocks have been used.

Now, let's add your 10GB of data and then create a snapshot. This will cause that 10GB of data to be copied to Amazon S3. You can't see it in S3, but Amazon EBS uses S3 for snapshots "behind the scenes". Each block that has been modified will be copied to S3 as a separate object. In addition, an 'index' will be stored that says "Snapshot #1 contains the following blocks". Therefore, the snapshot is a combination of the index and the data that is stored.

Next, let's delete some files, modify some files and add another 5GB of files. Taking another snapshot (#2) will now copy to S3 any blocks that are different to Snapshot #1, which means any blocks that have been modified or added. An index will be created that points to these new blocks, but also points to some of the blocks created in Snapshot #1 if those blocks were still present on the disk when Snapshot #2 was created. This highlights the "incremental" nature of a snapshot -- blocks that have not changed will not be copied again.

As for the blocks that were deleted, those blocks are kept in S3 because they are part of Snapshot #1, even though they are not present in Snapshot #2. This means that a new volume can be created from either Snapshot #1 or Snapshot #2.

If, however, Snapshot #1 is deleted, then any blocks only present in Snapshot #1 will also be deleted. However, any blocks that were part of both snapshots will be retained, since they are needed to restore Snapshot #2.

The simple rule is: Any data blocks that are part of an existing snapshot will be retained, so that the snapshot can be restored.

To make your mind spin even further, please note that AMIs are Snapshots with some additional metadata. So, if you launch an EC2 instance from an AMI, then the AMI is actually Snapshot #1. When you add/modify some data on that Amazon EBS volume and take a snapshot, it will copy make a copy of the blocks you changed but the snapshot will point to the AMI snapshot for most of the disk content (eg operating system).

Isoniazid answered 23/5, 2020 at 4:26 Comment(1)
Thanks! It couldn't be explained betterVersieversification
P
2

When you restore 'Snapshot 2' AWS essentially restores the first snapshot first then restores 'Snapshot 2' over the top of it. This makes it so that all the data is there without having to make a full backup every time. Each incremental backup only has to backup what has changed since the previous snapshot.

TLDR: If I was to have a guess I'd say AWS probably uses some other fancy logic in addition so to skip over data that will be changed multiple times before the most recent snapshot will be restored. Essentially, there is probably logic to prevent unnecessary writes because the same data is changed multiple times in intermediate snapshots before the final snapshot is applied.

Let me know if you have questions.

Perilous answered 23/5, 2020 at 4:8 Comment(0)
L
1

I found a well explained answer here..putting here for everybody's benefit. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSSnapshots.html

enter image description here

In State 1, the volume has 10 GiB of data. Because Snap A is the first snapshot taken of the volume, the entire 10 GiB of data must be copied.

In State 2, the volume still contains 10 GiB of data, but 4 GiB have changed. Snap B needs to copy and store only the 4 GiB that changed after Snap A was taken. The other 6 GiB of unchanged data, which are already copied and stored in Snap A, are referenced by Snap B rather than (again) copied. This is indicated by the dashed arrow.

In State 3, 2 GiB of data have been added to the volume, for a total of 12 GiB. Snap C needs to copy the 2 GiB that were added after Snap B was taken. As shown by the dashed arrows, Snap C also references 4 GiB of data stored in Snap B, and 6 GiB of data stored in Snap A.

The total storage required for the three snapshots is 16 GiB.

Langevin answered 14/3, 2021 at 4:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.