Optimisation: Unpacking Queueing Theory in its Simplest Terms

Introduction
Well, Queueing Theory can offer a solution to this common frustration. As the name implies, Queueing Theory applies mathematical models to evaluate queues or wait lines with an aim of optimising operational efficiency.
In the case of supermarkets, for instance, by analysing the customer queues, supermarkets are able to identify the optimal number of cashier counters and staff required to serve the customers efficiently without negatively affecting the customer waiting times. The goal of the analysis is to strike a balance between customer satisfaction and resource constraints.
However, despite its usefulness, the theory is often associated with complex probability theory and math. That's why, in this article, rather than diving into heavy math, I will try to simplify the concept to give you a big picture of how Queueing Theory works and, most importantly, how all the terminologies are interconnected.
Contents
- Applications of Queueing Theory
- Inner workings of Queueing Theory
- **Little's Theorem
- Kendall's Lee Notation
- Poisson Process (Stochastic Process)**
- Simple Queueing Model Simulation in Python
- Queueing Theory vs Machine Learning (Which one is better?)
Applications of Queueing Theory
Queueing Theory is nothing new. In fact, it has been around for a decade and has been widely used in the operations field. Queueing theory, by definition, falls under the branch of Management Science or Operations Research. But we can see that the fields of Operations Research and Data Science are gradually merging because of their common goal of using data to support strategic decision-making.
Back to the point: in short, the primary use case of Queueing Theory is to enhance resource allocation and operational efficiency. It is also viewed as a supply-demand problem. If the demand increases by x%, how much do I need to increase my capacity to meet this demand without exceeding my budget constraints?
Nowadays, some enterprises offer Queue Management System (QMS software) that is built upon the theoretical framework of Queueing models. While QMS is capable of handling the practical aspects of queue management, Queueing Theory is still useful in terms of providing insights into strategic planning.
So, let's look at some real-life applications:
Anywhere that involves lining up and waiting – Queueing Theory is applicable and relevant.
(1) Telecommunications
The telecom industry can be considered the birth of Queueing Theory because the model was originally developed to cut down on the waiting times of customers in call centres.
(2) Transportation, Supply Chain and Logistics
Traffic control: Queueing models help the design process of a traffic control system by examining traffic flow, vehicle arrival rates, and other factors.
Supply Chain & Logistics: Similarly, Queueing systems are applied to manage the flow of goods during the process of receiving, storing, and shipping them and to plan vehicle routes at loading and unloading points as well.
(3) Healthcare
Emergency Department (ED): The majority of research around Queueing models (in the context of healthcare) revolves around ED, as delays in ED are a matter of life and death. But its use cases are not just limited to capacity planning in ED.
(4) Customer Service
This covers a wide range of venues, including retail chains and banks, where customer satisfaction plays a critical role.
(5) Computer Networks
You might not often see this one highlighted in articles, yet its role is crucial in managing loads across computer networks.
Inner workings of Queueing Theory
We can't tackle its practical applications without understanding the fundamentals. So, this section may sound a little theoretical for some, but stick with me here; it's not as daunting as it sounds once we break it down.
You have probably noticed that once you start poking around on this topic, the depth of information you can find online varies a lot. Some sources talk about stochastic probability, others explain only Little's theorem, and then you see terms like Kendall's notation, the M/M/1 system, and the Erlang models in academic papers. So, do they all have the same meanings under different terminologies? Not really, but they are all related to each other.
After doing some research, it comes down to two key aspects: Little's Theorem and Kendall's notation.
In layman's terms, Little's Theorem can provide us with high-level insights for initial capacity planning, whereas Kendall's notation system can give us more detailed insights.
Depending on your requirements, you can opt for a simple calculation or more complex modelling like Kendall's notation system.
Little's Theorem
Little's Theorem (Little's Law) is the foundation of the Queueing Theory. The formula is straightforward, and it consists of only three key parameters:
- Average number of items/people in the system ( L )
- Average rate at which items/people arrive in the system ( λ )
- Average time an item/person spends in the system ( W )
Basically, the equation describes the relationship between those three parameters as:
*L = λ W**
The specific terms used may vary by fields. For instance, in Supply Chain, it is described as:
Work-In-Process (WIP) = Throughput (TH) * Cycle Time (CT)
Example Scenario:
Suppose we own a small coffee shop, and we want to determine how many staff and cashier counters we will need to run it efficiently.
Little's theorem can be a handy tool.
Let's say we collected these data based on our observations:
- On average, 20 customers join the queue per hour.
- Each customer spends an average of 10 minutes in the check-out area of the coffee shop (from ordering to leaving the line).
Note: Alternatively, this scenario can be considered ‘from the moment customers enter the shop to leaving the shop'" if we are interested in the occupancy of the shop rather than the check-out area. It depends on how we define the boundaries of the "system".
If we plug in the given variables, we get:
- λ = 20 customers per hour
- W = 10 minutes = 10/60 = 0.16 hour
- *L = 20 0.16 = approx. 3 customers**
Simply speaking, we can expect an average of 3 customers to be present in the check-out area at any time.
So, what does this mean for us?
Using this insight, we can do high-level capacity planning. We can roughly decide if we require more cashier counters and staff to handle transactions, if we need to adjust the layout to accommodate the demand, or if an average of 3 customers in line meets our standard for customer satisfaction. You get the picture.
If I were a small coffee shop owner, this basic information might be all I need to make informed decisions. However, you will notice that this result does not give me any insights about customer waiting times, utilisation rates of the cashier counters, and so on.
This is because our calculation didn't account for the number of counters/staff available or the service time per customer. Moreover, it ignores variables like peak hours and off-peak hours that affect the variability of arrival rates. So, in a way, Little's theorem is only suitable when there is a steady state with little to no variation.
This brings us to an advanced mathematical modelling system called Kendall Lee's notation system, where more nuances are taken into account.
Kendall Lee's Notation
Kendall's notation itself is not a formula or model but rather a standard classification system that represents the components of the queueing process. The system consists of six shorthand notations that are written in order: A/B/C/D/E/F. (some sources use A/S/c/K/D/N)
I will carry on with the coffee shop analogy for simplicity, but keep in mind that the system design and modelling are much more complex when it comes to real-world situations.

The way notation is written is simple. You just replace it with the notation that fits your system design.
- If our coffee shop operation follows the Poisson process, has 1 cashier counter, and uses a FIFO discipline, we can denote it as M/M/1/FIFO or just M/M/1. (‘M' here stands for ‘Markovian' )
- If we have multiple counters, it is denoted as M/M/c (‘c' refers to multiple servers)
In addition to the examples I provided above, other forms of probability distributions and queueing disciplines are also possible, depending on the scenarios. Choosing the right Queueing system requires domain knowledge. For instance, Erlang B and Erlang C models are particularly common in the telecom industry.
Poisson Process
Next, we will briefly look at probability theory. As I mentioned above, the rate at which customers arrive at the coffee shop is not steady. Probability distribution (the first two elements from Kendall's notion) can capture this aspect.
Poisson Distribution and Exponential Distribution are the two main probability distributions that most queueing behaviours adhere to (in theory). The two are closely related but serve different purposes.

Poisson Distribution focuses on counting the number of occurrences over a certain period. As shown in the Poisson Process image, Poisson is used to model the likelihood of different numbers of customers arriving at the coffee shop at each minute (1, 1, 2, 1, 0, 3). On the flip side, Exponential Distribution deals with the time interval between events, for example, the time taken by a barista to serve each customer at a cashier counter.

The primary assumption of the Poisson process is that the events are completely random and are not dependent on each other. In a coffee shop example, the model assumes that the arrival of 1 customer at a given moment doesn't directly lead to the arrival of another.
In terms of the modelling element, principles from Little's theorem are incorporated into the calculations, along with the probability distributions and other parameters described by Kendall's notation system. I will not go through the math behind each model, but if you are interested, you can explore more here.
That's enough for theory. And let's move on to more practical usage.
Simple Queueing Model Simulation in Python
For real-life problems, obviously, we don't manually work out the formulas. Simulation is one of the most common approaches in the context of Queueing Theory. Through simulation, we can create virtual environments to test out different scenarios, such as how much the waiting times would change if the number of customers arriving had changed from 20 to 120 per hour.
We can develop simulation models in dedicated simulation software or in Python. If you'd like to dive deeper, you can explore here for a list of tools. For a more advanced sample model, you can also take a look here.
import numpy as np
# Setting the parameters for the M/M/1 queue
lambda_arrival_rate = 120 / 60 # arrival rate per minute (120 customers per hour)
mu_service_rate = 1 / (30 / 60) # service rate per minute (30 customers per hour or 1 customer every 2 minutes)
num_customers = 2000 # number of customers to simulate
# Arrays to hold simulation data
arrival_times = np.cumsum(np.random.exponential(1/lambda_arrival_rate, num_customers))
service_times = np.random.exponential(1/mu_service_rate, num_customers)
departure_times = np.zeros(num_customers)
# Initialise first departure time
departure_times[0] = arrival_times[0] + service_times[0]
# Simulate the queue
for i in range(1, num_customers):
departure_times[i] = max(arrival_times[i], departure_times[i-1]) + service_times[i]
# Calculate wait times and number of customers in the system
wait_times = departure_times - arrival_times
# We need to simulate the state of the system at each arrival
system_state = np.zeros(num_customers)
for i in range(1, num_customers):
# Count how many have arrived but not yet departed at the current arrival time
system_state[i] = np.sum((arrival_times <= arrival_times[i]) & (departure_times > arrival_times[i]))
# Calculating average wait time and average number of customers in the system
average_wait_time = np.mean(wait_times)
average_customers_in_system = np.mean(system_state)
print(f"Average wait time (in minutes): {average_wait_time:.2f}")
print(f"Average number of customers in the system: {round(average_customers_in_system)}")
Disclaimer: The simulation model presented is just a simplified version (without considering the complexities) to provide a conceptual snapshot of queueing theory applications. Personally, I haven't used simulation software for Queueing models.
According to the simulation results, for a peak-hour scenario where we have 120 customers arriving per hour and the service time takes 2 minutes per customer, we get:
Average waiting time (in minutes): 12.12
Average number of customers in the queue/system: 25
Increasing the demand from 20 to 120 while keeping the resource constraints the same, i.e., 1 cashier counter, results in higher waiting times.
Another observation from my experimentation with different arrival rates (e.g., 100 instead of 120 customers) is that waiting times didn't significantly fluctuate until they reached a certain threshold (around 120 in this case). Roughly speaking, this is a point where the system is no longer capable of clearing the queue efficiently, leading to a system overload.
As a coffee shop owner, if I am expecting more than 120 customers an hour, I would consider hiring an additional staff/barista to manage the increased transaction volumes more effectively.
Now I hope it is clear that we can gain additional insights by applying a M/M/1 system rather than a simple Little's formula.
Queueing Theory vs Machine Learning
Finally, the next logical question you might have is, "Why can't we just use Machine Learning to predict wait times?" The answer is yes and no. The choice between a traditional approach and ML involves trade-offs.
So, the answer depends on several factors:
(1) Prescriptive or Predictive
ML techniques, often considered ‘black box' models, may not suit your needs if your goal is to understand the ‘why' behind predictions. Queueing models are more interpretable in that sense. On the other hand, if your goal is to yield strong predictive performance, ML may be a superior choice.
Note: I have come across this paper which experimented with both Queueing and ML techniques and found that Queueing model yielded comparable results to ML.
(2) Resource Availability
Implementing and maintaining ML models comes with computational costs. If the performance of a ML model can't beat a simple Queueing model significantly, the hefty investment in ML may not be justified.
(3) Nature of Data
As you can see by now, Queueing Theory relies on so many statistical assumptions that might not always mirror real-life complexities. In that case, ML models are much more flexible and can uncover complex patterns in the data as opposed to Queueing models. All you need is a sufficient training dataset that covers most real-life scenarios to develop a reliable model.
Closing remarks
In this article, we've navigated the broad applications of Queueing Theory using the coffee shop analogy. We also looked into the foundational concepts that underpin the theory. Hopefully, you now have a clear understanding of how this analytical tool operates and how it can be leveraged to optimise service Operations across industries.
References
- Ryan Berry – Queueing Theory
- List of Queueing Theory Software
- Allen Downey – Modeling and Simulation in Python: One Queue or Two
- Delay Prediction for Managing Multiclass Service Systems: An Investigation of Queueing Theory and Machine Learning Approaches
- A Machine Learning Approach to Waiting Time Prediction in Queueing Scenarios