Queueing theory can be used to study processes in which arrivals and/or service are subject to certain temporal variations:
If arrivals and service are timed (i.e., a customer arrives every exactly 60 seconds, for example, and a service always takes exactly 59 seconds), there will be no queues. In all other cases, temporary queues may occur. The queueing theory provides insights into how long the resulting queues are on average, how long customers will have to wait on average, and also the distribution of waiting times.
On this basis, the owners of the systems can make optimizations. In the call center sector, for example, the goal is often that 80% of callers should not have to wait longer than 20 seconds. With a known average number of callers per minute and a known average waiting time, the necessary number of operators is now sought to meet these targets.
Current research shows that workpieces typically wait 75% of the total cycle time as they move through a production line, meaning that reducing mean wait times has a significant impact on total cycle time.
In queueing theory, one always speaks of "clients" and "operators". The clients can be e.g. airplanes and the operators can be runways and so on.
Arrivals are measured in clients per hour, clients per minute, etc., i.e. it is considered how many clients arrive on average in a certain period of time. Instead of working with the mean number of arrivals, the mean time between two arrivals, the so-called mean inter-arrival time E[I], is usually used. The smaller E[I] is, the more clients will arrive per time unit.
When studying a queueing system, it is important to consider how uniformly clients arrive. If a client arrives every exactly 60 seconds and a service takes exactly 59 seconds each, no one has to wait. If, however, the interval between two arrivals is once 30 and once 90 seconds (i.e. still 60 seconds on average), waiting times will occur. Therefore, it is helpful if the standard deviation of the inter-arrival times Std[I] is also known. If clients cannot see the queues and do not coordinate, Std[I]=E[I] applies. For timed arrivals, it holds Std[I]=0.
As an alternative to the mean inter-arrival time E[I], the arrival rate λ is also frequently used. It is the reciprocal of the mean inter-arrival time. I.e. a short mean inter-arrival time corresponds to a high value of λ:
So, for example, a mean inter-arrival time of 2 minutes corresponds to an arrival rate of ½ client per minute.
The service process is characterized by the average service time E[S]. The larger E[S] is, the longer an operation takes.
Here, too, the more uniform the service times are, the fewer waiting times will occur. Therefore, it is also advantageous if the variations of the service times are known in the form of the standard deviation of the service times Std[S].
Analogous to the arrival rate, the service rate μ is defined as the reciprocal of the mean service time. The longer the mean service time, the smaller is μ:
If the average service time is e.g. 3 minutes, this corresponds to a service rate of 1/3 clients per minute.
Clients arrive at the system according to an arrival rate λ and first enter the waiting room. There are c operators available at the process station. Here, c is a natural number, so it can be 1, 2, 3, and so on. If an operator is available, the next waiting client is directed to the operator and a service process with the service rate μ starts. After the service is completed, the client leaves the system and the operator is ready to serve the next client.
This basic model can be extended in many ways, e.g.:
The basic model of the queueing theory represents a basic component of a queueing network. In a queueing network, several such components are coupled, e.g. to represent an entire production line.
The service order specifies which of the waiting clients is to be served next. If the clients form a classic queue, the client who has already waited the longest, i.e. is at the front of the queue, will be served next. This is referred to as the first come first serve (FCFS) or first in first out (FIFO) operating rule. However, if the clients are (heavy) workpieces that have been stacked, the lowest workpiece may not be removed directly from this stack, i.e. it is placed on top of the stack and also pulled from the top. In this case, the last come first serve (LCFS) or the last in first out (LIFO) operating rule applies.
LIFO leads to strongly varying throughput times of individual workpieces through production. - Once a workpiece is at the bottom of the stack, it will not be able to get away from there any time soon. In order to make throughput times as predictable as possible, LIFO should therefore be avoided whenever possible.
In addition, if clients of different types arrive at a queue, a priority formula can be used to select the client to be served next in each case.
In the simplest case, arrivals and serves are described only by arrival and serve rate, i.e., a complex process in each case is characterized by only a single number. If a client arrives at the system every exactly 2 minutes, this results in an average inter-arrival time of 2 minutes or an arrival rate of 0.5 clients per minute. If, however, the inter-arrival time is now 1 and 3 minutes alternately, then it is just as true that the mean inter-arrival time is 2 minutes, but it is likely to be significantly more difficult to serve such a client arrival stream without waiting times.
Therefore, it is useful not only to consider the mean durations, but to use other parameters, such as the variation (expressed by the standard deviation or the coefficient of variation).
In analytical queueing theory, only the exponential distribution can be used. Here, expected value=standard deviation always applies. If this is not the case in the real system to be represented, the model does not adequately represent reality. In simulation models, however, practically any probability distribution can be used. The following distributions are typical:
In summary: For the inter-arrival times, the exponential distribution is suitable in many cases (and then only the mean inter-arrival time has to be specified). For the service times, the log-normal or gamma distribution should be used. The mean service time and also the standard deviation of the service times have to be specified here then.
If the standard deviation of the service times is not known, values in the range 0.2x to 0.8x of the expected value are usually used. The higher the human influence on the service times, the higher the standard deviation should be, or the more automated the process, the lower the standard deviation should be.
Queueing models can be described by the so-called Kendall notation. In its simplest form, this description has the following appearance:
Inter-arrival times / service times / number of operators / waiting and service room size
For the inter-arrival and the service times, "M" is used for the exponential distribution "Ek" for the Erlang-k distribution, "D" for deterministic values and "G" for a general, i.e. not explicitly known distribution.
Examples:
For the description of more complex models there are numerous extensions of the Kendall notation.
The workload a is the quotient of arrival and service rate:
(In the last fraction, the numerator and denominator each refer to the same unit of time). The workload, when rounded up to the next integer, indicates the minimum number of operators required for the system to operate stably in the long term. If c<a, then on average more clients will arrive than the operators can serve on average, i.e. in the long term the queue will become longer and longer.
The utilization ρ is a value between 0 and 1. It results when the workload is divided by the number of operators:
While those who have to pay the operators or buy the process stations want a high utilization (as this lowers the unit costs), from the client's point of view high values of ρ mean long waiting times on average.
In a queueing system without queue abandoners or other special features, the condition ρ<1 means that λ<cμ has to be hold.
The mean waiting time E[W] of the clients is obtained by an analytical formula (see below) for simple models or by a simulation for more complex models.
The total amount of time a client spends at a process station is made up of the waiting time and the service time. For the mean residence time E[V] holds:
The mean residence time is the sum of the mean waiting time E[W] and the mean service time E[S].
In the industrial context, the residence time is also referred to as the cycle time. The aim is usually to minimize the cycle time. Since the necessary service times are usually fixed by the operating speed of the respective machines, this practically means that the avoidable waiting times are to be reduced.
Assuming that the machines do not work faster or slower in long queues, i.e. that the waiting times and the service times are stochastically independent, the variation of the residence times is also equal to the sum of the variation of the waiting times and the service times. A high variation of the residence times means that the completion dates of individual products can only be predicted with difficulty, i.e. delivery dates promised in advance can often not be met. Consequently, the aim is to keep the variation in residence times as low as possible. Since the variation in service times can usually be influenced just as little as the average service time, this means that the variation in waiting times should be minimized.
The abbreviation "WIP" stands for "Work units in process", i.e. the number of workpieces in the process (waiting or in service). The average number of workpieces or clients in the system is denoted by E[N].
If one is only interested in the average number of waiting clients, the symbol E[NQ] is used.
Little's law establishes a relationship between the number of clients in the system and the mean residence time, or between the number of clients waiting and the mean waiting time:
This means: If the arrival rate λ and the average number of workpieces in the system E[N] are known, the average cycle time can be calculated directly:
Little's law applies regardless of which distributions are given for inter-arrival and service times.
The residence time is the sum of the waiting time and the service time:
If we substitute into this formula for E[V] and E[W] the terms from the rearranged Little's law, multiply by λ and replace λ/μ by a, we get:
I.e. the average queue length E[NQ] and the average number of clients in the system E[N] differ exactly by a or on average a clients are in service process.
If λ, μ and c are given, it is sufficient to know one of the other quantities E[W], E[V], E[NQ] or E[N] to be able to calculate the other performance indicators. However, exactly this determination of one of the four values is not easy in most cases.
For M/M/1 models, i.e., for exponentially distributed inter-arrival and service times and one operator, the performance indicators can be easily calculated. The following apply:
If the inter-arrival times and the service times are exponentially distributed and there are c≥1 operators in the system, then according to the Kendall notation it is an M/M/c model. For this, P(W≤t), i.e. the probability that a client has to wait at most t≥0 seconds, can be calculated using the Erlang C formula:
with . The other performance indicators can then be derived from this:
The Erlang C formula can also be extended to include impatient customers (i.e. waiting cancellers). This formula will not be printed here.
If the inter-arrival or service times are no longer exponentially distributed, i.e. if E[I]=Std[I] and E[S]=Std[S] do not apply anymore, the models can no longer be solved exactly for c>1. For this case the Allen-Cunneen approximation formula is available.
The Allen-Cunneen approximation formula is based on the Erlang C formula and extends it by a correction factor to compensate as best as possible for the deviation of Std[I] and Std[S] from the Erlang C values. If E[NQ]M/M/c is the expected value for the queue length in the Erlang C case, the following approximations apply:
Here, SCV[I] and SCV[S] are the squared coefficients of variation of the inter-arrival and service times, respectively:
The Allen-Cunneen approximation formula was developed by running many simulations (at that time on mainframes and accepting long runtimes) and comparing the results with the Erlang C results. The correction factor (SCV[I]+SCV[S])/2, which depends on Std[I] and Std[S], was then designed to reflect the simulation results as well as possible.
Since the mainframe weeks of those days are now smartphone seconds, it is only worth using these formulas (which only provide approximations in each case) to a limited extent.
Chapter 3 of "Simulation mit dem Warteschlangensimulator" (in German) describes the terms and relationships described here in more detail:
https://link.springer.com/book/10.1007/978-3-658-34668-3
In principle, simulation methods can be used to represent all desired model properties of a queueing network. In contrast to approximation formulas, the results are then also exact (within the scope of the selected simulation runtime).
The runtimes for many models are in the single-digit second range, so that earlier statements that simulations are too slow and that it is better to use an (inaccurate but fast) approximation formula are now outdated.
The page presents some open source programs and web services for modeling and simulating queueing networks.