Basic concepts of queueing theory

Where is queueing theory used?

Queueing theory can be used to study processes in which arrivals and/or service are subject to certain temporal variations:

In a customer service call center, calls arrive at random times. Servicing customers also takes different lengths of time.
Aircraft usually arrive at an airport a little earlier or a little later than planned, depending on the prevailing weather conditions (especially wind speed and direction).
In a shopfloor production, the exact service time (especially at stations with manual work) cannot be specified exactly, so that the throughput times fluctuate through the production.

If arrivals and service are timed (i.e., a customer arrives every exactly 60 seconds, for example, and a service always takes exactly 59 seconds), there will be no queues. In all other cases, temporary queues may occur. The queueing theory provides insights into how long the resulting queues are on average, how long customers will have to wait on average, and also the distribution of waiting times.

Waiting times in customer service

On this basis, the owners of the systems can make optimizations. In the call center sector, for example, the goal is often that 80% of callers should not have to wait longer than 20 seconds. With a known average number of callers per minute and a known average waiting time, the necessary number of operators is now sought to meet these targets.

Waiting times in production

Current research shows that workpieces typically wait 75% of the total cycle time as they move through a production line, meaning that reducing mean wait times has a significant impact on total cycle time.

Basic concepts

Clients and operators

In queueing theory, one always speaks of "clients" and "operators". The clients can be e.g. airplanes and the operators can be runways and so on.

Arrivals

Arrivals are measured in clients per hour, clients per minute, etc., i.e. it is considered how many clients arrive on average in a certain period of time. Instead of working with the mean number of arrivals, the mean time between two arrivals, the so-called mean inter-arrival time E[I], is usually used. The smaller E[I] is, the more clients will arrive per time unit.

When studying a queueing system, it is important to consider how uniformly clients arrive. If a client arrives every exactly 60 seconds and a service takes exactly 59 seconds each, no one has to wait. If, however, the interval between two arrivals is once 30 and once 90 seconds (i.e. still 60 seconds on average), waiting times will occur. Therefore, it is helpful if the standard deviation of the inter-arrival times Std[I] is also known. If clients cannot see the queues and do not coordinate, Std[I]=E[I] applies. For timed arrivals, it holds Std[I]=0.

Arriva rate

As an alternative to the mean inter-arrival time E[I], the arrival rate λ is also frequently used. It is the reciprocal of the mean inter-arrival time. I.e. a short mean inter-arrival time corresponds to a high value of λ:

$λ = \frac{1}{E [I]} = \frac{Number of arrivals}{Time unit}$

So, for example, a mean inter-arrival time of 2 minutes corresponds to an arrival rate of ½ client per minute.

Service

The service process is characterized by the average service time E[S]. The larger E[S] is, the longer an operation takes.

Here, too, the more uniform the service times are, the fewer waiting times will occur. Therefore, it is also advantageous if the variations of the service times are known in the form of the standard deviation of the service times Std[S].

Service rate

Analogous to the arrival rate, the service rate μ is defined as the reciprocal of the mean service time. The longer the mean service time, the smaller is μ:

$μ = \frac{1}{E [S]} = \frac{Number of service processes (per operator)}{Time unit}$

If the average service time is e.g. 3 minutes, this corresponds to a service rate of 1/3 clients per minute.

Basic model of queueing theory

Clients arrive at the system according to an arrival rate λ and first enter the waiting room. There are c operators available at the process station. Here, c is a natural number, so it can be 1, 2, 3, and so on. If an operator is available, the next waiting client is directed to the operator and a service process with the service rate μ starts. After the service is completed, the client leaves the system and the operator is ready to serve the next client.

This basic model can be extended in many ways, e.g.:

Clients are not willing to wait arbitrarily long and will leave the system if they have to wait too long.
The number of operators is not constant, but operators take breaks occasionally or there are different numbers available throughout the day due to a shift plan.
Clients do not arrive individually, but in groups (e.g. by bus).
Clients are not served individually, but in groups.
Clients of different types arrive at the system with different typical service times (e.g., "Client wants to buy ticket" to "Client wants to discuss delay-related refund").
Clients should be prioritized differently according to their types (e.g., "approaching aircraft with low residual fuel" before "departing aircraft" at an airport).
After a service process, the operator has to go into a so-called post-processing time before he is available again for another service process. (This is usually the case at call centers).
If an operator is to serve clients of different types one after the other, he first has to go into a so-called set-up time. (E.g. if the clients are car bodies to be painted and the type specifies the color. The setup time is then the time for changing the color at the paint gun. By campaigning as long as possible from clients of the same type, one tries to avoid such set-up times as much as possible).

The basic model of the queueing theory represents a basic component of a queueing network. In a queueing network, several such components are coupled, e.g. to represent an entire production line.

Service order

The service order specifies which of the waiting clients is to be served next. If the clients form a classic queue, the client who has already waited the longest, i.e. is at the front of the queue, will be served next. This is referred to as the first come first serve (FCFS) or first in first out (FIFO) operating rule. However, if the clients are (heavy) workpieces that have been stacked, the lowest workpiece may not be removed directly from this stack, i.e. it is placed on top of the stack and also pulled from the top. In this case, the last come first serve (LCFS) or the last in first out (LIFO) operating rule applies.

LIFO leads to strongly varying throughput times of individual workpieces through production. - Once a workpiece is at the bottom of the stack, it will not be able to get away from there any time soon. In order to make throughput times as predictable as possible, LIFO should therefore be avoided whenever possible.

In addition, if clients of different types arrive at a queue, a priority formula can be used to select the client to be served next in each case.

Probability distributions

In the simplest case, arrivals and serves are described only by arrival and serve rate, i.e., a complex process in each case is characterized by only a single number. If a client arrives at the system every exactly 2 minutes, this results in an average inter-arrival time of 2 minutes or an arrival rate of 0.5 clients per minute. If, however, the inter-arrival time is now 1 and 3 minutes alternately, then it is just as true that the mean inter-arrival time is 2 minutes, but it is likely to be significantly more difficult to serve such a client arrival stream without waiting times.

Therefore, it is useful not only to consider the mean durations, but to use other parameters, such as the variation (expressed by the standard deviation or the coefficient of variation).

In analytical queueing theory, only the exponential distribution can be used. Here, expected value=standard deviation always applies. If this is not the case in the real system to be represented, the model does not adequately represent reality. In simulation models, however, practically any probability distribution can be used. The following distributions are typical:

Deterministic: The clients arrive with exactly the same interval and the service takes exactly the same time. If both the arrivals and the services are deterministic, we speak of timed production.
Exponential distribution: Standard deviation=expected value applies. This represents a very high variation. For service processes, the standard deviation is usually smaller than the expected value of the service times. However, for inter-arrival times, standard deviation=expected value applies if the clients arrive independently or cannot see the queues. For example, the inter-arrival times at a customer service call center are exponentially distributed. The use of the exponential distribution is the prerequisite for almost all exact formulas in queueing theory.
Log-normal and gamma distribution: These two distributions are equally suitable for mapping service times. With both, the expected value and standard deviation can be set individually. The possibility to use these distributions is one of the great advantages of simulation compared to analytical formulas. In other scientific fields (e.g., financial and actuarial mathematics), a precise distinction has to be made between the two distributions; the log-normal distribution is heavy tailed, the gamma distribution is not. For modeling rare large loss events, this property is of great importance. For the modeling of service times, however, it is irrelevant.

In summary: For the inter-arrival times, the exponential distribution is suitable in many cases (and then only the mean inter-arrival time has to be specified). For the service times, the log-normal or gamma distribution should be used. The mean service time and also the standard deviation of the service times have to be specified here then.

If the standard deviation of the service times is not known, values in the range 0.2x to 0.8x of the expected value are usually used. The higher the human influence on the service times, the higher the standard deviation should be, or the more automated the process, the lower the standard deviation should be.

Kendall notation

Queueing models can be described by the so-called Kendall notation. In its simplest form, this description has the following appearance:

Inter-arrival times / service times / number of operators / waiting and service room size

For the inter-arrival and the service times, "M" is used for the exponential distribution "Ek" for the Erlang-k distribution, "D" for deterministic values and "G" for a general, i.e. not explicitly known distribution.

Examples:

M/M/1/∞
describes a queueing model with exponentially distributed inter-arrival and service times, one operator, and an unrestricted waiting and service room.
M/G/5/5
describes a queueing model with exponentially distributed inter-arrival times and service times that are subject to an unknown distribution (i.e., only E[S] and Std[S] are known). The system has 5 operators. The waiting and service room has 5 places. Since we also have exactly 5 operators, this means that there can be no queue. If a new client arrives while all operators are occupied, he will be rejected.

For the description of more complex models there are numerous extensions of the Kendall notation.

Parameters

Workload

The workload a is the quotient of arrival and service rate:

$a = \frac{λ}{μ} = \frac{Number of arrivals}{Number of service processes (per operator)}$

(In the last fraction, the numerator and denominator each refer to the same unit of time). The workload, when rounded up to the next integer, indicates the minimum number of operators required for the system to operate stably in the long term. If c<a, then on average more clients will arrive than the operators can serve on average, i.e. in the long term the queue will become longer and longer.

Utilization

The utilization ρ is a value between 0 and 1. It results when the workload is divided by the number of operators:

$ρ = \frac{a}{c} = \frac{λ}{c μ}$

While those who have to pay the operators or buy the process stations want a high utilization (as this lowers the unit costs), from the client's point of view high values of ρ mean long waiting times on average.

In a queueing system without queue abandoners or other special features, the condition ρ<1 means that λ<cμ has to be hold.

Waiting time

The mean waiting time E[W] of the clients is obtained by an analytical formula (see below) for simple models or by a simulation for more complex models.

Residence time

The total amount of time a client spends at a process station is made up of the waiting time and the service time. For the mean residence time E[V] holds:

$E [V] = E [W] + E [S]$

The mean residence time is the sum of the mean waiting time E[W] and the mean service time E[S].

In the industrial context, the residence time is also referred to as the cycle time. The aim is usually to minimize the cycle time. Since the necessary service times are usually fixed by the operating speed of the respective machines, this practically means that the avoidable waiting times are to be reduced.

Delivery reliability

Assuming that the machines do not work faster or slower in long queues, i.e. that the waiting times and the service times are stochastically independent, the variation of the residence times is also equal to the sum of the variation of the waiting times and the service times. A high variation of the residence times means that the completion dates of individual products can only be predicted with difficulty, i.e. delivery dates promised in advance can often not be met. Consequently, the aim is to keep the variation in residence times as low as possible. Since the variation in service times can usually be influenced just as little as the average service time, this means that the variation in waiting times should be minimized.

WIP

The abbreviation "WIP" stands for "Work units in process", i.e. the number of workpieces in the process (waiting or in service). The average number of workpieces or clients in the system is denoted by E[N].

If one is only interested in the average number of waiting clients, the symbol E[N_Q] is used.

Calculation of the performance indicators

Little's law

Little's law establishes a relationship between the number of clients in the system and the mean residence time, or between the number of clients waiting and the mean waiting time:

$\begin{matrix} E [N] & = & λ E [V] \\ E [N_{Q}] & = & λ E [W] \end{matrix}$

This means: If the arrival rate λ and the average number of workpieces in the system E[N] are known, the average cycle time can be calculated directly:

$E [V] = \frac{E [N]}{λ}$

Little's law applies regardless of which distributions are given for inter-arrival and service times.

Further simple relations

The residence time is the sum of the waiting time and the service time:

$E [V] = E [W] + E [S]$

If we substitute into this formula for E[V] and E[W] the terms from the rearranged Little's law, multiply by λ and replace λ/μ by a, we get:

$E [N] = E [N_{Q}] + a$

I.e. the average queue length E[N_Q] and the average number of clients in the system E[N] differ exactly by a or on average a clients are in service process.

If λ, μ and c are given, it is sufficient to know one of the other quantities E[W], E[V], E[N_Q] or E[N] to be able to calculate the other performance indicators. However, exactly this determination of one of the four values is not easy in most cases.

One-operator model

For M/M/1 models, i.e., for exponentially distributed inter-arrival and service times and one operator, the performance indicators can be easily calculated. The following apply:

$\begin{matrix} ρ & = & a = \frac{λ}{μ} \\ E [N_{Q}] & = & \frac{a^{2}}{1 - a} \\ E [N] & = & \frac{a}{1 - a} \\ E [W] & = & \frac{a}{μ - λ} \\ E [V] & = & \frac{a}{μ - λ} + \frac{1}{μ} \end{matrix}$

Erlang C formula

If the inter-arrival times and the service times are exponentially distributed and there are c≥1 operators in the system, then according to the Kendall notation it is an M/M/c model. For this, P(W≤t), i.e. the probability that a client has to wait at most t≥0 seconds, can be calculated using the Erlang C formula:

$P (W \leq t) = 1 - P_{1} e^{- (c - a) μ t}$

with $P_{1} := \frac{a^{c} c}{c! (c - a)} {(\sum_{k = 0}^{c - 1} \frac{a^{k}}{k!} + \frac{a^{c} c}{c! (c - a)})}^{- 1}$ . The other performance indicators can then be derived from this:

$\begin{matrix} E [N_{Q}] & = & P_{1} \frac{a}{c - a} \\ E [N] & = & P_{1} \frac{a}{c - a} + a \\ E [W] & = & P_{1} \frac{1}{c μ - λ} \\ E [V] & = & P_{1} \frac{1}{c μ - λ} + \frac{1}{μ} \end{matrix}$

The Erlang C formula can also be extended to include impatient customers (i.e. waiting cancellers). This formula will not be printed here.

Allen-Cunneen approximation formula

If the inter-arrival or service times are no longer exponentially distributed, i.e. if E[I]=Std[I] and E[S]=Std[S] do not apply anymore, the models can no longer be solved exactly for c>1. For this case the Allen-Cunneen approximation formula is available.

The Allen-Cunneen approximation formula is based on the Erlang C formula and extends it by a correction factor to compensate as best as possible for the deviation of Std[I] and Std[S] from the Erlang C values. If E[N_Q]_M/M/c is the expected value for the queue length in the Erlang C case, the following approximations apply:

$\begin{matrix} E [N_{Q}] & \approx & {E [N_{Q}]}_{M / M / c} \cdot \frac{SCV [I] + SCV [S]}{2} \\ E [N] & \approx & {E [N_{Q}]}_{M / M / c} \cdot \frac{SCV [I] + SCV [S]}{2} + a \\ E [W] & \approx & {E [N_{Q}]}_{M / M / c} \cdot \frac{1}{λ} \cdot \frac{SCV [I] + SCV [S]}{2} \\ E [V] & \approx & {E [N_{Q}]}_{M / M / c} \cdot \frac{1}{λ} \cdot \frac{SCV [I] + SCV [S]}{2} + \frac{1}{μ} \end{matrix}$

Here, SCV[I] and SCV[S] are the squared coefficients of variation of the inter-arrival and service times, respectively:

$SCV [I] = \frac{{(Std [I])}^{2}}{{(E [I])}^{2}} und SCV [S] = \frac{{(Std [S])}^{2}}{{(E [S])}^{2}}$

The Allen-Cunneen approximation formula was developed by running many simulations (at that time on mainframes and accepting long runtimes) and comparing the results with the Erlang C results. The correction factor (SCV[I]+SCV[S])/2, which depends on Std[I] and Std[S], was then designed to reflect the simulation results as well as possible.

Since the mainframe weeks of those days are now smartphone seconds, it is only worth using these formulas (which only provide approximations in each case) to a limited extent.

More about queueing theory

Chapter 3 of "Simulation mit dem Warteschlangensimulator" (in German) describes the terms and relationships described here in more detail:

https://link.springer.com/book/10.1007/978-3-658-34668-3

Simulation

In principle, simulation methods can be used to represent all desired model properties of a queueing network. In contrast to approximation formulas, the results are then also exact (within the scope of the selected simulation runtime).

The runtimes for many models are in the single-digit second range, so that earlier statements that simulations are too slow and that it is better to use an (inaccurate but fast) approximation formula are now outdated.

The page presents some open source programs and web services for modeling and simulating queueing networks.