What does P99 latency represent? I keep hearing about this in discussions about an application's performance but couldn't find a resource online that would talk about this.
It's 99th percentile. It means that 99% of the requests should be faster than given latency. In other words only 1% of the requests are allowed to be slower.
Imagine that you are collecting performance data of your service and the below table is the collection of results (the latency values are fictional to illustrate the idea).
Latency Number of requests
1s 5
2s 5
3s 10
4s 40
5s 20
6s 15
7s 4
8s 1
The P99 latency of your service is 7s. Only 1% of the requests take longer than that. So, if you can decrease the P99 latency of your service, you increase its performance.
7s
), then we'd be left with 8s
but that is NOT the 99th percentile! Instead, I would explain it the other way around: Sort the requests in ascending order and discard the top/largest 1%. The largest remaining value is the 99th percentile. Here, there are 100 requests so the "top 1%" corresponds to the 1 largest request (the one that took 8s
). When we get rid of that, the max remaining value is 7s
, which is the correct 99th percentile. –
Trample We can explain it through an analogy, if 100 students are running a race then 99 students should complete the race in "latency" time.
Should
not will
. –
Medford Lets take an example from here
Request latency:
min: 0.1
max: 7.2
median: 0.2
p95: 0.5
p99: 1.3
So we can say, 99 percent of web requests, the average latency found was 1.3ms (milli seconds/microseconds depends on your system latency measures configured). Like @tranmq said, if we decrease the P99 latency of the service, we can increase its performance.
And it is also worth noting the p95, since may be few requests makes p99 to be more costlier than p95 e.g.) initial requests that builds cache, class objects warm up, threads init, etc. So p95 may be cutting out those 5% worst case scenarios. Still out of that 5%, we dont know percentile of real noise cases Vs worst case inputs.
Finally; we can have roughly 1% noise in our measurements (like network congestions, outages, service degradations), so the p99 latency is a good representative of practically the worst case. And, almost always, our goal is to reduce the p99 latency.
Explaining P99 it through an analogy:
If 100 horses are running in a race, 99 horses should complete the race in less than or equal to "latency" time. Only 1 horse is allowed to finish the race in time higher than "latency" time.
That means if P99 is 10ms, 99 percentile requests should have latency less than or equal to 10ms.
If p99 value is 1ms, it means, 99 out of 100 requests take less than 1ms, and 1 request take about 1 or more than 1ms.
P99 is a latency measure.
In very simple words, P99 latency represents a threshold of application performance where 99 percent of requests/tasks/operations complete in time below that threshold (99 percentile).
For example, if out of 100 requests to an application, 99 complete within 4ms, then P99 latency of that application is 4ms. In other words, only 1 request is allowed to complete in more than 4ms.
Also, if 95 requests to that application complete within 3ms, the P95 latency of that application is 3ms. And only 5 requests are allowed to (or should) take more than 3 ms to complete.
And if 90 percent requests complete within 2ms, the P90 latency is 2ms with only 10 requests allowed to take more than 2ms to complete.
From above you can guess that P90 and P95 are two other latency measures along with P99.
To put it simply, imagine you have an API with a contract stating that it must respond within 10 milliseconds (ms) to callers. Over the course of an hour, you've received various requests from different consumers:
Consumer A made 10 requests at 10:00 am with responses taking 5ms each. Consumer B sent 2 requests at 10:05 am, each with a 5ms response. At 10:07 am, Consumer B submitted 20 requests, each taking 7ms to respond. Again at 10:07 am, Consumer B had 20 more requests with 7ms responses. At 10:20 am, Consumer B requested 20 times, with responses taking 11ms. Consumer B made 30 requests at 10:15 am, with responses at 12ms. At 10:30 am, Consumer B submitted 20 requests, and each took 10ms. Finally, at 10:43 am, Consumer B had 40 requests, with 9ms responses. If we sort these response times in ascending order, the second-highest response time is 11ms, which exceeds the agreed 10ms. This value, known as P99, indicates that 99% of responses were below or equal to 11ms. Since P99 is above the agreed response time, we should also check P95, which examines if 95% of all requests breach the agreed response time. If they do, we must also look into P90. By continuously monitoring these metrics (P90, P95, and P99), the Operations team can swiftly identify issues in the service or infrastructure and take corrective action.
© 2022 - 2025 — McMap. All rights reserved.