Percentile Latency: How [High] are you?

Rajneesh Sharma
5 min readJan 29, 2023

--

Averaging percentiles is mathematically absurd!”

In this article, we are going to talk about Percentile Latency. But before diving into it, let’s simplify it by breaking this term into two separate terms, Percentile and latency. So, we are gonna talk about these separately and then combined.

What is latency ?

Latency is simply defined as the time taken by an operation or task to happen or, something to get executed completely. Every thing has its latency but here we are talking about latencies in software systems. Before talking about measuring latencies, more interesting part is how latency behaves, as latency never follows any normal, Gaussian, or Poisson distribution, so looking at averages, medians, and even standard deviations is useless.

Latency tends to be heavily multi-modal, and part of this is attributed to “hiccups” in response time. Hiccups resemble periodic freezes and can be due to any number of reasons — Garbage Collection pauses, hypervisor pauses, context switches, interrupts, database reindexing, cache buffer flushes to disk, etc. These hiccups never resemble normal distributions and the shift between modes is often rapid and eclectic.

Now we know what latency is and the behavior trajectory is not that straight-forward for a latency. But why do we need it for software? The reason is that it is a metric for performance of softwares. Moreover, reducing latency enhances the end-user/customer experience.

Humans detect visual stimuli in approximately 200 ms, However, perceivable internet latency is much lower than that. So for websites, you generally want to optimize for ~300ms page load times or faster, and definitely under 1second. And for that, your APIs have to be even quicker, ideally under ~50–100ms. Low latency is the key to winning your customers!

How to metric latency ?

Metrics exist because humans don’t have the capacity to fully understand how software systems behave. Two reasons for this: 1.) they are big, and 2.) we observe them indirectly (you can’t see inside a CPU). We do have few ways to calculate the latency, and we will discuss those ways taking example of web requests made to a server.

Average

Average is one way of latency metric, probably not the best and accurate, but still it exists. Let’s say we have 6 web requests with execution time as follows: 40ms, 37ms, 60ms, 55ms, 450ms, 700ms. Now if someone asks what’s the latency of your system, you can not say that our least latency is 37ms as it’s the best case, and you can not say 700ms as well, because it might be due to some issue or maybe an outlier, and we have faster requests than that. So what do we do in this case? What if we calculate the average value, which is: 224ms. Now, One would be fine to hear, as this seems like a realistic and reasonable value to live with, but even this is absurd, because we don't know if the requests are taking this much time, or if it was some infrastructure or system bug that intermittently increases the execution time of some requests. Hence, average is not a decent method to metric latency.

Percentile

The median is usually preferred to other measures of central tendency when your data set is skewed.

Percentile Latency is something used widely to metric software latencies. Let’s briefly understand what is percentile. Given a series of records, a percentile is a value greater than those many percentages of the records, for example: 20th Percentile is a value greater than 20% of the values. We take the same data again i.e. 6 web requests with execution time as follows: 40ms, 37ms, 60ms, 55ms, 450ms, 700ms. We, now sort them in ascending order, 37ms, 40ms, 55ms, 60ms, 450ms, 700ms. This is our new data on which we will do the aggregation. The first aggregation we will talk about is P50 (which is simply median). In this case, to calculate P50, we remove the first 50% elements and pick the first from the left ones which is 60ms, and we get a great metric to say that our 50% request are faster than 60ms. This sounds more accurate, isn't it? But even this doesn't completely describes the performance as the upper band still could contain bigger values, so that’s why we have multiple (sometimes random) percentiles to consider for the performance, such as P95 (which tells the percentile latency of 95% of requests), P90, P99 or maybe P75(to explain the latency of 75% requests). Ideally, it makes sense if the difference between P99, P95, P90, P70 are significant as that would explain more about the performance.

It is to be noted that when we talk about these percentile values, people often use P100 and P0, which is mathematically incorrect, as for being P100, one have to be faster/bigger/smaller than 100% including itself, hence mathematically incorrect. There might be some ifs and buts among the mathematicians but this is what I have read and understood. Well, we have got a great technique to metric latency of web requests and to measure the performance of software and backend systems. But there is still one question, that how accurate are they? and is it really the ideal way we of measuring the performance of a software?

How accurate are percentile latencies?

When we think of how accurate percentile latencies are, we are not questioning the concept of percentile latency, because it is an accurate and mathematically proven concept, but what we are questioning is are we really getting the percentile data accurately and precisely? There is no single metric which defines the behavior of latency. Observation tells us that most of the latency benchmarks are broken, because the benchmarking tools are broken. They generally measure the service time instead of the total response time. Let’s say if the system experiences one of the “hiccups” described earlier, you will only have one bad operation and 10,000 other operations waiting in line. When those 10,000 other things go through, they will look really good when in reality the experience was really bad. Long operations only get measured once, and delays outside the timing window don’t get measured at all. These tools even average out the values as well to minimize storage space. So, the problem is with how the benchmarking tools are measuring latency. Well there is a great depth in this topic as it connects the software having the latency and the tooling software measuring the latency, and the end-user experience facing the latency. We can say evidently, that percentile latencies are a decent enough way to describe the performance of a software.

Thanks for reading, Until next time. You can find me on LinkedIn or GitHub.

--

--

Rajneesh Sharma
Rajneesh Sharma

Written by Rajneesh Sharma

Software Engineer || Competitive Programmer || Backend Developer || Love writing code || C++, Python, GO || Machine Learning

No responses yet