What is P99 latency?

Asked 9/10, 2012 at 21:49 Answered 6/3, 2024 at 1:22

Solved web-services networking web-applications

334

What does P99 latency represent? I keep hearing about this in discussions about an application's performance but couldn't find a resource online that would talk about this.

Solicit answered 9/10, 2012 at 21:49 Comment(1)

Organize all your data points from lowest to highest, from left to right. Gather the lowest (leftmost) 99% of the data points, and discard the remaining 1% to the right. The highest value in this gathered group (the right-most value in this left group) is the P99 value. – Anole 22/5, 2023 at 19:0

449

It's 99th percentile. It means that 99% of the requests should be faster than given latency. In other words only 1% of the requests are allowed to be slower.

Waechter answered 9/10, 2012 at 21:54 Comment(2)

Only 1% of requests are expected to be slower. – Eratosthenes 8/10, 2021 at 20:34

Besides, it is a straightforward answer and has links to the definition, I prefer the @kanagavelu-sugumar answer, which gives an example and also explains why the P99 will be better than p95 in a context. Just remember to consider the context. – Cooperstein 13/7, 2022 at 14:4

127

Imagine that you are collecting performance data of your service and the below table is the collection of results (the latency values are fictional to illustrate the idea).

Latency    Number of requests
1s         5
2s         5
3s         10
4s         40
5s         20
6s         15
7s         4
8s         1

The P99 latency of your service is 7s. Only 1% of the requests take longer than that. So, if you can decrease the P99 latency of your service, you increase its performance.

Colloquial answered 30/10, 2019 at 15:14 Comment(6)

Found this as more practical example :) – Althorn 21/2, 2022 at 13:10

I like this example! It's easier to understand. – Lauree 11/3, 2022 at 12:14

How/why did we select 7 here ? – Subrogate 13/3, 2022 at 18:30

@ShahbazZaidi You take all your requests and discard 99% of the bottom ones. In this example above, we discard all requests with latency from 1s to 7s. – Weaks 28/3, 2022 at 15:56

@ShahbazZaidi If I understand ThePavolC's explanation correctly, it's close but not quite right: If we discard the bottom 99% (including 7s), then we'd be left with 8s but that is NOT the 99th percentile! Instead, I would explain it the other way around: Sort the requests in ascending order and discard the top/largest 1%. The largest remaining value is the 99th percentile. Here, there are 100 requests so the "top 1%" corresponds to the 1 largest request (the one that took 8s). When we get rid of that, the max remaining value is 7s, which is the correct 99th percentile. – Trample 15/8, 2023 at 22:20

Source: My explanation above is based off of this answer – Trample 15/8, 2023 at 22:21

We can explain it through an analogy, if 100 students are running a race then 99 students should complete the race in "latency" time.

Sorghum answered 24/6, 2016 at 18:55 Comment(7)

Should not will. – Medford 8/3, 2018 at 23:37

Also, <= 'latency time' – Flanch 27/4, 2018 at 0:31

It's the time that the student who came in 99th crossed the line. – Retrospection 28/8, 2018 at 14:44

I love this analogy. – Broads 21/10, 2019 at 16:43

What if there are only 50 students? – Miter 5/11, 2021 at 18:41

@Tyler Lui Then half a student should complete the race in "latency" time – Deron 15/3, 2022 at 4:43

The modal verb being used is likely confusing the answer a bit. The percentile ranking of the latency of a given set of requests is the measurement of requests that have already occurred 100% (that is all of them) in the past. If your p99 value is 1 ms, that means that 99% of the sample that was taken to come up with the ranking had a latency of 1 ms or less. There is no could, would, or will. That whole nomenclature is used to argue ideas about the past and how likely the prediction of their future performance is true. @AaronS said it best with should. – Sanitarian 27/7, 2023 at 13:2

Lets take an example from here

Request latency:
    min: 0.1
    max: 7.2
    median: 0.2
    p95: 0.5
    p99: 1.3

So we can say, 99 percent of web requests, the average latency found was 1.3ms (milli seconds/microseconds depends on your system latency measures configured). Like @tranmq said, if we decrease the P99 latency of the service, we can increase its performance.

And it is also worth noting the p95, since may be few requests makes p99 to be more costlier than p95 e.g.) initial requests that builds cache, class objects warm up, threads init, etc. So p95 may be cutting out those 5% worst case scenarios. Still out of that 5%, we dont know percentile of real noise cases Vs worst case inputs.

Finally; we can have roughly 1% noise in our measurements (like network congestions, outages, service degradations), so the p99 latency is a good representative of practically the worst case. And, almost always, our goal is to reduce the p99 latency.

Flotsam answered 7/11, 2019 at 14:47 Comment(0)

Explaining P99 it through an analogy: If 100 horses are running in a race, 99 horses should complete the race in less than or equal to "latency" time. Only 1 horse is allowed to finish the race in time higher than "latency" time.

That means if P99 is 10ms, 99 percentile requests should have latency less than or equal to 10ms.

Gisser answered 3/8, 2022 at 8:11 Comment(0)

If p99 value is 1ms, it means, 99 out of 100 requests take less than 1ms, and 1 request take about 1 or more than 1ms.

Hachmin answered 5/2, 2023 at 1:57 Comment(0)

P99 is a latency measure.

In very simple words, P99 latency represents a threshold of application performance where 99 percent of requests/tasks/operations complete in time below that threshold (99 percentile).

For example, if out of 100 requests to an application, 99 complete within 4ms, then P99 latency of that application is 4ms. In other words, only 1 request is allowed to complete in more than 4ms.

Also, if 95 requests to that application complete within 3ms, the P95 latency of that application is 3ms. And only 5 requests are allowed to (or should) take more than 3 ms to complete.

And if 90 percent requests complete within 2ms, the P90 latency is 2ms with only 10 requests allowed to take more than 2ms to complete.

From above you can guess that P90 and P95 are two other latency measures along with P99.

Medford answered 6/3, 2024 at 1:22 Comment(0)

To put it simply, imagine you have an API with a contract stating that it must respond within 10 milliseconds (ms) to callers. Over the course of an hour, you've received various requests from different consumers:

Consumer A made 10 requests at 10:00 am with responses taking 5ms each. Consumer B sent 2 requests at 10:05 am, each with a 5ms response. At 10:07 am, Consumer B submitted 20 requests, each taking 7ms to respond. Again at 10:07 am, Consumer B had 20 more requests with 7ms responses. At 10:20 am, Consumer B requested 20 times, with responses taking 11ms. Consumer B made 30 requests at 10:15 am, with responses at 12ms. At 10:30 am, Consumer B submitted 20 requests, and each took 10ms. Finally, at 10:43 am, Consumer B had 40 requests, with 9ms responses. If we sort these response times in ascending order, the second-highest response time is 11ms, which exceeds the agreed 10ms. This value, known as P99, indicates that 99% of responses were below or equal to 11ms. Since P99 is above the agreed response time, we should also check P95, which examines if 95% of all requests breach the agreed response time. If they do, we must also look into P90. By continuously monitoring these metrics (P90, P95, and P99), the Operations team can swiftly identify issues in the service or infrastructure and take corrective action.

Shelled answered 9/8, 2023 at 14:33 Comment(0)

Recommended topics

Hot tags