How to get more Disk performance in Google Cloud?
Asked Answered
K

1

6

One of the volumes for one of our (Ubuntu 16.04) Google Cloud VM's is at 100% disk utilization pretty much all the time - here is a 10 second sample plucked at random from the system:

iostat -x 10

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdd               0.60    17.20 5450.50 2468.00 148923.60 25490.00    44.05    11.81    1.49    1.13    2.29   0.13  99.60

This is currently a 2.5T persistent SSD.

My understanding is I can't get better performance by adding virtual "spindles" and then distributing the workload across them.

This is a database volume, so I can't really use volatile SSD either.

I have currently XFS on it with these mount options:

type xfs (rw,noatime,nodiratime,attr2,nobarrier,inode64,noquota)

Any suggestions?

Krystynakshatriya answered 27/10, 2017 at 13:37 Comment(7)
Will making the disk bigger (even if the extra disk is not used) also make it faster?Krystynakshatriya
Considering as reliable what stated by the Google Cloud Console during the creation of a disk, the number of IOPs do not increase together with the disk space once you overcome 500GB. How many vCpus your instance have?Frumenty
The instance currently has 32 vcpusKrystynakshatriya
Because the Random IOPS supported per Instance depends from the number of vCpu as well. The documentation states that the number of reads should be: 16 to 31 vCPU=25,000 32+ vCPUs=40,000. I don't know it would worth a try to increase the number. I link you the followin document just in case you didn't see it before: cloud.google.com/compute/docs/disks/…Frumenty
Thanks for that link. I tried switching the I/O scheduler to "noop". (I'd missed that bullet point before.) We'll see if it helps. [I'd already tuned the read ahead on the disks from the default of 256 to 4096 sometime back and forgotten I'd done that.]Krystynakshatriya
Using RAID 10 with multiple disks (500GB SSD each minimum) for example may speed up your storage. Your instance should have 8 CPU at least for maximum network performance. Or, maybe you can look at higher level? You can add database replicas and read data from them. Partitioning is also a great way to distribute writes.Confectionary
AFAIK, In Google Cloud, on a Linux Platform, RAID does not help. There are no "spindles" to divide the disk workload over. If anything, the additional CPU overhead may actually make things runs slower. If we had separate fiber channels to the SAN and we could map volumes to specific channels it might help with channel congestion, however Compute Engine VM's don't support that at this time.Krystynakshatriya
P
1

All persistent types (both HDD and SSD) of disk storage on GCE are network-based, where data is replicated to remote storage for higher availability. This aspect is also the reason behind the performance considerations as available network bandwidth has to be shared fairly amongst multiple tenants on the same physical machine.

GCE limits disk performance for both IOPS and bandwidth - you will be limited by whatever you hit first. The reason for this is that lots of small operations are more costly than a few large ops.

Both IOPS and bandwidth are limited on 3 aspects:

  • Type (HDD vs. SSD)
  • Size of the disk (larger disks enjoy higher limits)
  • Core count (larger instances enjoy higher limits as they occupy a larger fraction of a machine)

Additionally, PD traffic is factored into the per-core network egress cap.

The documentation has an in-depth article going through all these aspects. In summary, once you max out on disk size, type and core count, there is no way to increase performance further.

Creating RAID arrays of multiple persistent disks will not lead to increased performance as you will still hit the per-instance limit and network egress cap.

Peder answered 7/7, 2019 at 21:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.