Advice for data storage on Amazon EC2 especially for databases [closed]
Asked Answered
R

2

7

I've been playing around with Amazon's Web Services for over a year now, however I don't quite understand how it works. When I for example select an AMI of my choice from the EC2 console and I continue through the wizard, I reach the "Storage Configuration Tab". There are several options here.

There is the root volume tab and then there is the EBS volume tab. How do both of these differ? What is the maximum size I can allocate for each? How can I configure the EBS Volumes to work with my Instance? Say for example I decide to create 8 EBS volumes each with 25 GB of storage....now for something like a Postgresql database which naturally lives on the root device, how I configure it so the database is stored across all 8 EBS volumes? In a sense, the 8 EBS volumes becoming one 200 GB drive and the postgres database data stored across that whole drive.

Any form of clarification will be appreciated.

Rahm answered 16/4, 2013 at 17:3 Comment(1)
Sorry this was closed, I would've expected it to have been migrated to ServerFault or dba.stackexchange.com instead.Nigh
N
11

You should read the benefits of EBS vs instance store. I also wrote a bit about the PostgreSQL angle of this on my work blog recently. See also what root device to use for a new EC2 instance and the other questions listed in the Related sidebar.

Instance store will eventually EAT YOUR DATA unless you carefully set up replication and regular backups. If an instance fails or is terminated you cannot get your data back if it's on an instance store. You need good backups anyway, it's just more important with instance store and you need to be more careful about having near-real-time replication set up.

On the other hand, EBS is more likely to be affected by outages and faults that render it unavailable for a time; your data may still exist, but if you can't get to it for a couple of hours you can't fail over until the fault is fixed. So you really need good backups and replication anyway.

Quick answer, I'll leave the detailed explanations to the post:

  • The root volume is either EBS or instance store, depending on AMI type.

  • In the volumes tab you can add additional volumes. You can choose whether these are EBS or instance store volumes at volume creation time, irrespective of the AMI type. Different instance sizes have different limits on number and size of instance store volumes, but all have the same limits on EBS volumes.

  • The maximum size of an instance store volume is defined by the instance type. See the documentation for your instance. The maximum size of an EBS volume is in the first paragraph of the EBS documentation:

    Amazon EBS volumes are created in a particular Availability Zone and can be from 1 GB to 1 TB in size.

  • The PostgreSQL database doesn't "naturally live on the root volume" really. It lives where you put it. If you're using a package-manager installed version it'll usually be put in /var/lib/pgsql or /var/lib/postgres, but you can either change the startup script options to move it elsewhere, replace that with a symlink to the desired location, or mount a new volume at that point. There are ample discusions of how to move PostgreSQL on Stack Overflow, dba.stackexchange.com and serverfault so I won't repeat all that here.

  • To combine multiple EBS volumes use Linux's software RAID (md). EBS is just like any other disk as far as Linux is concerned, so see the usual documentation for setting up Linux software RAID.

Personally I've been quite unimpressed with the performance of EC2, at least with PostgreSQL. You can get a very fast database running, but only at a pretty crushing price. It's very convenient if you want to fire up some big databases for a short term job, but it isn't economic as a long lived hosting option, you're better off looking at VPS providers that offer better I/O performance. Search ServerFault, dba.stackexchange.com, etc.

Finally, a reminder: Instance store on high I/O instances seems to be faster than the other options ... but if you have to shut down or reboot your instance or the instance fails you will lose all data on your instance store volumes, so you must have good backups and real-time replication if you're going to use the instance store.

Nigh answered 16/4, 2013 at 23:32 Comment(0)
D
-2

The shorter answer is:

For quick and dirty, you can just have instance store on all your EC2 instances, and do backups to S3. The advantage of EBS over instance store is that when you kill that server, the EBS will stay and can be reused, and an instance store won't.

200Gb is small space, you can just get one storage device (instance store) for it, and then backup to S3 or replicate the whole 200Gb thing. Chances are, you won't be using RAID or haddrive replication to improve your database's reliability/availability.

tl;dr

Use instance store unless you need the volumes to be transferable between servers.

Dani answered 16/4, 2013 at 23:47 Comment(2)
I think that's really dangerous advice, especially your tl;dr. "Use instance store for the PostgreSQL volume unless you need volumes to be transferrable between servers [or you care about your data and aren't confident you understand how to build a robust PostgreSQL replication setup across multiple AZs or regions]". You're dealing with a new user, you really need to be clear and explicit about the severe data loss risk involved in using instance store without properly understanding it and setting up proper redundancy.Nigh
Yep Craig, valid point. I just reflected on the idea that a beginner user is going to spend a lot of time to build a fragile solution with EBS replication, which will break and take up a lot of time in maintenance, and be ultimately replaced by other solutions. Instance stores are also cheaper, so a beginner on his own could save a buck by not using EBS.Dani

© 2022 - 2024 — McMap. All rights reserved.