Which GPU should I use on Google Cloud Platform (GCP)

Asked 22/10, 2021 at 9:35 Answered 26/5, 2024 at 7:42

Solved google-cloud-platform cloud gpu nvidia gcloud-compute

Right now, I'm working on my master's thesis and I need to train a huge Transformer model on GCP. And the fastest way to train deep learning models is to use GPU. So, I was wondering which GPU should I use among the ones provided by GCP? The ones available at the current moment are:

NVIDIA® A100
NVIDIA® T4
NVIDIA® V100
NVIDIA® P100
NVIDIA® P4
NVIDIA® K80

Kaminsky answered 22/10, 2021 at 9:35 Comment(0)

It all depends on what are the characteristics you're looking for.

First, let's collect some information about these different GPU models and see which one suits you best. You can use this link to track the GPUs performance, and this link to check the pricing of older GPU cores, and this link for the accelerator-optimized ones.

I did that and I created the following table

Updated (June 2024)

Model	FP32 (TFLOPS)	Price/hour	TFLOPS/dollar
Nvidia H100 †	67	11.06125	6.0571816
Nvidia L4 ‡	30.3	1.000416	30.28740044
Nvidia A100 ‡	19.5	3.673477	5.308322333
Nvidia Tesla T4	8.1	0.35	23.14285714
Nvidia Tesla P4	5.5	0.6	9.166666667
Nvidia Tesla V100	14	2.48	5.64516129
Nvidia Tesla P100	9.3	1.46	6.369863014
~~Nvidia K80~~	~~8.73~~	~~0.45~~	~~19.4~~

† The mimimum amount of GPUs to be used is 8.

‡ price includes 1 GPU + 12 vCPU + default memory.

In the previous table, you see can the:

FP32: which stands for 32-bit floating point which is a measure of how fast this GPU card with single-precision floating-point operations. It's measured in TFLOPS or *Tera Floating-Point Operations... The higher, the better.
Price: Hourly-price on GCP.
TFLOPS/Price: simply how much operations you will get for one dollar.

From this table, you can see:

Nvidia H100 is the fastest.
Nvidia Tesla P4 is the slowest.
Nvidia L4 is the most expensive.
Nvidia Tesla T4 is the cheapest.
Nvidia Tesla L4 has the highest operations per dollar.
Nvidia Tesla A100 has the lowest operations per dollar.
Nvidia K80 went out-of-support as of May 1 2024.

And you can observe that clearly in the following figure:

I hope that was helpful!

Kaminsky answered 22/10, 2021 at 9:35 Comment(4)

TL;DR : A100 GPUs are great if you can afford them. Otherwise, T4 GPUs offer the best bang for buck. – Videogenic 2/11, 2021 at 9:0

Where are A100 available? I looked at the US data centers at cloud.google.com/compute/gpus-pricing and I don't see it listed. But I know they exist – Mowbray 14/9, 2023 at 19:22

@CsabaToth, I know I'm 4 months late, but here are the available zones in US for A100 from this link: us-central1-a, us-central1-b, us-central1-c, us-central1-f, us-east1-b, us-east4-c, us-east5-b, us-west1-b, us-west3-b, and us-west4-b. – Kaminsky 2/1, 2024 at 7:49

Here is a source for the GPU performances: cloud.google.com/compute/docs/gpus#performance_comparison_chart – Waxy 28/2, 2024 at 7:42

Nvidia says that using the most modern, powerful GPUs is not only faster, it also ends up being cheaper: https://developer.nvidia.com/blog/saving-time-and-money-in-the-cloud-with-the-latest-nvidia-powered-instances/

Google came to a similar conclusion (this was a couple of years ago before the A100 was available): https://cloud.google.com/blog/products/ai-machine-learning/your-ml-workloads-cheaper-and-faster-with-the-latest-gpus

I guess you could make an argument that both Nvidia and Google could be a little biased in making that judgement, but they are also well placed to answer the question and I see no reason not to trust them.

Vaughan answered 1/7, 2022 at 2:36 Comment(1)

The first article only compares A100 to V100. The second article does show how inefficient the k80 is, but also mentions that the T4 is ideal for many use-cases. From reading those articles, I couldn't conclude that the most modern and powerful GPUs are necessarily the cheapest. If you're willing to wait, the T4 seems to be the most cost-efficient option by a large margin. – Guerra 3/11, 2023 at 4:1

As of May 26, 2024, the NVidia L4 has better price/performance ratio than the T4.

Model	FP32(TFlops)	Price (zone: us-central1)	TFLOPS/dollar
L4	30.3	$0.644046	47.046328989
T4	8.1	$0.4025	20.124223602

The L4 also has more RAM (24GB) vs 16GB on the T4. That allows for bigger batch sizes and faster training.

Cullen answered 26/5, 2024 at 7:42 Comment(0)

Updated (June 2024)

Recommended topics

Hot tags