AWS EBS block size
Asked Answered
C

1

6

Can you point me to some resources on how EBS works behind the scenes for gp2 volumes? The way I understand it, it is a service, but really it is some form of connecting arrays of SSD drives to the instance, in a redundant way What is the actual, physical method of connecting? THe documentation refers to the fact that data is transferred in 16KB or 256KB blocks, but I can't find any more about that. If for example, in Linux, my partition is formatted with 4KB blocks, does this mean that EBS will transfer data to and from disk with 16KB block, if so wouldn't it make sense to also format the partition with 16KB block and also optimise it upstream? If I have a set of very random 4k operations, will this trigger the same amount of 16KB block requests? If anyone's done such testing already, I'd really like to hear it...

Castera answered 14/4, 2017 at 15:13 Comment(0)
M
20

The actual, physical means of connection is over the AWS software-defined Ethernet LAN. EBS is essentially a SAN. The volumes are not physically attached to the instance, but they are physically within the same availability zone, the access is over the network.

If the instance is "EBS Optimized," there's a separate allocation of Ethernet bandwidth for communication between the instance and EBS. Otherwise, the same Ethernet connection that handles all of the IP traffic for the instance is also used by EBS.

The SSDs behind EBS gp2 volumes are 4KiB page-aligned.

See AWS re:Invent 2015 | (STG403) Amazon EBS: Designing for Performance beginning around 24:15 for this.

As explained in AWS re:Invent 2016: Deep Dive on Amazon Elastic Block Store (STG301), an EBS volume is not a physical volume. They're not handing you an SSD drive. An EBS volume is a logical volume that spans numerous distributed devices throughout the availability zone. (The blocks on the devices are also replicated within EBS within the availability zone to a second device.)

These factors should make it apparent that the performance of the actual SSDs is not an especially significant factor in the performance of EBS. EBS, by all appearances, allocates resources in proportion to what you're paying for the volume... which is of course directly proportional to the size of the volume as well as which feature set (volume type) you've selected.

16KiB is the nominal size of an I/O that EBS uses for establishing performance benchmarks for gp2. It probably has no other special significance, as it appears to be related as much or more to the processing resources that EBS allocates to your volume as to the media devices themselves -- EBS volumes live in storage clusters that have "resources" of their own (CPU, memory, network bandwidth, etc.) and 16KiB seems to be a nominal value related to some kind of resource allocation in the EBS infrastructure.

Note that the sc1 and st1 volumes use a very different nominal I/O size: 1 MiB. Obviously, that can't be related to anything about the physical storage device, so this lends credence to the conclusion that the 16KiB number for gp2 (and io1).

A gp2 volume can perform up to the lowest of several limits:

  • 160 MiB/second, depending on the connected instance type‡
  • The current number of instantaneous IOPS available to the volume, which is the highest of
    • 100 IOPS regardless of volume size
    • 3 IOPS per provisioned GiB of volume size
    • The IOPS credits available for with in your token bucket, capped at 3,000 IOPS
  • 10,000 IOPS per volume regardless of how large the volume is

‡Smaller instance types can't provide 160MiB/second of network bandwidth, anyway. For example, the r3.xlarge has only half a gigabit (500 Mbps) of network bandwidth, limiting your total traffic to EBS to approximately 62.5 MiB/sec, so you won't be able to push any more throughput to an EBS volume than this from an instance of that type. Unless you are using very large instances or very small volumes, the most likely constraint on your EBS performance is going to be the limits of the instance, not the limits of EBS.

You are capped at the first (lowest) threshold in the list above, the impact of the nominal 16 KiB I/O size is this: if your I/Os are smaller than 16KiB, your maximum possible IOPS does not increase, and if they are larger, your maximum possible IOPS may decrease:

  • an I/O size of 4KiB will not improve performance, since the nominal size of an I/O for rate limiting purposes is established 16KiB, but
  • an I/O size of 4KiB is unlikely to meaningfully decrease performance with sequential I/Os since, for EBS's accounting purposes, are internally combined. So, if your instance were to make 4 × 4 KiB sequential I/O requests, EBS is likely to count that as 1 I/O anyway
  • an I/O size of 4KiB and extremely random I/Os would indeed not be combined, so would theoretically perform poorly relative to the same number of 16KiB extremely random I/Os, but instinct and experience tells me this borders on academic and theoretical territory except perhaps in extremely rare cases. It could just as likely hurt as help, since small writes would use the same number of IOPS but transfer more unnecessary data across the wire.
  • if your I/Os are larger than 16KiB, your maximum IOPS will decrease if your disk bandwidth reaches the 160MiB/s threshold before reaching the IOPS threshold.

A final thought, EBS performs best under load. That is to say, a single thread making a series of random I/Os will not keep the EBS volume's queue filled with requests. When that is not the case, you will not see the maximum possible performance.

See also Amazon EBS Volume Performance on Linux Instances for more discussion of EBS performance.

Mosaic answered 15/4, 2017 at 1:23 Comment(3)
Thanks. COmapres to to my local NVM SSD, gp2 is not too impressive. My SSD can do 90000 IOPS and 2GB/s sequential compared to 1000k and 160 MiB/second on ec2. Also the block size makes it less efficient for small random reads...Castera
So which is the best block size then, in plain english? e.g. if im using ddUnnecessary
@Unnecessary in plain english, no smaller than 16K for SSD and 1M for HDD. As noted above, you are likely to find that the volume can dish out the data faster than the instance can take it, unless the volume is small and/or the instance is large. Experiment and let us know if you have results that contradict this. If you are reading from an EBS volume created recently from a snapshot, you will also encounter significant slowdown caused by first-read/warmup, as EBS's internal workers go to S3 to fetch blocks from the snapshot that have not yet actually been copied to the volume.Mosaic

© 2022 - 2024 — McMap. All rights reserved.