PostgreSQL performance on EC2/EBS

Here there is some linked info. The main take-away is this post from Bryan Murphy:

Been running a very busy 170+ gb OLTP postgres database on Amazon for 1.5 years now. I can't say I'm "happy" but I've made it work and still prefer it to running downtown to a colo at 3am when something goes wrong.

There are two main things to be wary of:

1) Physical I/O is not very good, thus how that first system used a RAID0.

Let's be clear here, physical I/O is at times terrible. :)

If you have a larger database, the EBS volumes are going to become a real bottleneck. Our primary database needs 8 EBS volumes in a RAID drive and we use slony to offload requests to two slave machines and it still can't really keep up.

There's no way we could run this database on a single EBS volume.

I also recommend you use RAID10, not RAID0. EBS volumes fail. More frequently, single volumes will experience very long periods of poor performance. The more drives you have in your raid, the more you'll smooth things out. However, there have been occasions where we've had to swap out a poor performing volume for a new one and rebuild the RAID to get things back up to speed. You can't do that with a RAID0 array.

2) Reliability of EBS is terrible by database standards; I commented on this a bit already at http://archives.postgresql.org/pgsql-general/2009-06/msg00762.php The end result is that you must be careful about how you back your data up, with a continuous streaming backup via WAL shipping being the recommended approach. I wouldn't deploy into this environment in a situation where losing a minute or two of transactions in the case of a EC2/EBS failure would be unacceptable, because that's something that's a bit more likely to hapen here than on most database hardware.

Agreed. We have three WAL-shipped spares. One streams our WAL files to a single EBS volume which we use for worst case scenario snapshot backups. The other two are exact replicas of our primary database (one in the west coast data center, and the other in an east coast data center) which we have for failover.

If we ever have to worst-case-scenario restore from one of our EBS snapshots, we're down for six hours because we'll have to stream the data from our EBS snapshot back over to an EBS raid array. 170gb at 20mb/sec (if you're lucky) takes a LONG time. It takes 30 to 60 minutes for one of those snapshots to become "usable" once we create a drive from it, and then we still have to bring up the database and wait an agonizingly long time for hot data to stream back into memory.

We had to fail over to one of our spares twice in the last 1.5 years. Not fun. Both times were due to instance failure.

It's possible to run a larger database on EC2, but it takes a lot of work, careful planning and a thick skin.

Bryan

Recommended topics

Hot tags