1.6: Relation of Probability Models to the Real World

Last updated
Save as PDF

Page ID: 67049

Robert Gallager
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Whenever experienced and competent engineers or scientists construct a probability model to represent aspects of some system that either exists or is being designed for some application, they must acquire a deep knowledge of the system and its surrounding circumstances, and concurrently consider various types of probability models used in probabilistic analysis of the same or similar systems. Usually very simple probability models help in understanding the real-world system, and knowledge about the real-world system helps in understanding what aspects of the system are well-modeled by a given probability model. For a text such as this, there is insucient space to understand the real-world aspects of each system that might be of interest. We must use the language of various canonical real-world systems for motivation and insight when studying probability models for various classes of systems, but such models must necessarily be chosen more for their tutorial than practical value.

There is a danger, then, that readers will come away with the impression that analysis is more challenging and important than modeling. To the contrary, for work on real-world systems, modeling is almost always more dicult, more challenging, and more important than analysis. The objective here is to provide the necessary knowledge and insight about probabilistic models so that the reader can later combine this with a deep understanding of particular real application areas. This will result in a useful interactive use of models, analysis, and experimentation.

In this section, our purpose is not to learn how to model real-world problems, since, as said above, this requires deep and specialized knowledge of whatever application area is of interest. Rather it is to understand the following conceptual problem that was posed in Section 1.1. Suppose we have a probability model of some real-world experiment involving randomness in the sense expressed there. When the real-world experiment being modeled is performed, there is an outcome, which presumably is one of the outcomes of the probability model, but there is no observable probability.

It appears to be intuitively natural, for experiments that can be carried out repeatedly under essentially the same conditions, to associate the probability of a given event with the relative frequency of that event over many repetitions. We now have the background to understand this approach. We first look at relative frequencies within the probability model, and then within the real world.

Relative frequencies in a probability model

We have seen that for any probability model, an extended probability model exists for \(n\) IID idealized experiments of the original model. For any event \(A\) in the original model, the indicator function \(\mathbb{I}_{A}\) is a random variable, and the relative frequency of \(A\) over \(n\) IID experiments is the sample average of \(n\) IID rv’s each with the distribution of \(\mathbb{I}_{A}\). From the weak law of large numbers, this relative frequency converges in probability to \(\mathrm{E}\left[\mathbb{I}_{A}\right]=\mathrm{Pr}\{A\}\). By taking the limit \(n \rightarrow \infty\), the strong law of large numbers says that the relative frequency of \(A\) converges with probability 1 to \(\operatorname{Pr}\{A\}\).

In plain English, this says that for large \(n\), the relative frequency of an event (in the \(n\)-repetition IID model) is essentially the same as the probability of that event. The word essentially is carrying a great deal of hidden baggage. For the weak law, for any \(\epsilon, \delta>0\) the relative frequency is within some \(\epsilon\) of \(\operatorname{Pr}\{A\}\) with a confidence leve \(1-\delta\) whenever \(n\) is suciently large. For the strong law, the \(\epsilon\) and \(\delta\) are avoided, but only by looking directly at the limit \(n \rightarrow \infty\). Despite the hidden baggage, though, relative frequency and probability are related as indicated.

Relative frequencies in the real world

In trying to sort out if and when the laws of large numbers have much to do with real-world experiments, we should ignore the mathematical details for the moment and agree that for large n, the relative frequency of an event \(A\) over \(n\) IID trials of an idealized experiment is essentially \(\operatorname{Pr}\{A\}\). We can certainly visualize a real-world experiment that has the same set of possible outcomes as the idealized experiment and we can visualize evaluating the relative frequency of \(A\) over \(n\) repetitions with large \(n\). If that real-world relative frequency is essentially equal to \(\operatorname{Pr}\{A\}\), and this is true for the various events \(A\) of greatest interest, then it is reasonable to hypothesize that the idealized experiment is a reasonable model for the real-world experiment, at least so far as those given events of interest are concerned.

One problem with this comparison of relative frequencies is that we have carefully specified a model for \(n\) IID repetitions of the idealized experiment, but have said nothing about how the real-world experiments are repeated. The IID idealized experiments specify that the conditional probability of A at one trial is the same no matter what the results of the other trials are. Intuitively, we would then try to isolate the n real-world trials so they don’t affect each other, but this is a little vague. The following examples help explain this problem and several others in comparing idealized and real-world relative frequenices.

Example 1.6.1. Coin tossing: Tossing coins is widely used as a way to choose the first player in various games, and is also sometimes used as a primitive form of gambling. Its importance, however, and the reason for its frequent use, is its simplicity. When tossing a coin, we would argue from the symmetry between the two sides of the coin that each should be equally probable (since any procedure for evaluating the probability of one side should apply equally to the other). Thus since \(H\) and \(T\) are the only outcomes (the remote possibility of the coin balancing on its edge is omitted from the model), the reasonable and universally accepted model for coin tossing is that \(H\) and \(T\) each have probability 1/2.

On the other hand, the two sides of a coin are embossed in different ways, so that the mass is not uniformly distributed. Also the two sides do not behave in quite the same way when bouncing off a surface. Each denomination of each currency behaves slightly differently in this respect. Thus, not only do coins violate symmetry in small ways, but different coins violate it in different ways.

How do we test whether this effect is significant? If we assume for the moment that successive tosses of the coin are well-modeled by the idealized experiment of \(n\) IID trials, we can essentially find the probability of \(H\) for a particular coin as the relative frequency of \(H\) in a suciently large number of independent tosses of that coin. This gives us slightly different relative frequencies for different coins, and thus slightly different probability models for different coins.

The assumption of independent tosses is also questionable. Consider building a carefully engineered machine for tossing coins and using it in a vibration-free environment. A standard coin is inserted into the machine in the same way for each toss and we count the number of heads and tails. Since the machine has essentially eliminated the randomness, we would expect all the coins, or almost all the coins, to come up the same way — the more precise the machine, the less independent the results. By inserting the original coin in a random way, a single trial might have equiprobable results, but successive tosses are certainly not independent. The successive trials would be closer to independent if the tosses were done by a slightly inebriated individual who tossed the coins high in the air.

The point of this example is that there are many different coins and many ways of tossing them, and the idea that one model fits all is reasonable under some conditions and not under others. Rather than retreating into the comfortable world of theory, however, note that we can now find the relative frequency of heads for any given coin and essentially for any given way of tossing that coin.³⁸

Example 1.6.2. Binary data: Consider the binary data transmitted over a communication link or stored in a data facility. The data is often a mixture of encoded voice, video, graphics, text, etc., with relatively long runs of each, interspersed with various protocols for retrieving the original non-binary data.

The simplest (and most common) model for this is to assume that each binary digit is 0 or 1 with equal probability and that successive digits are statistically independent. This is the same as the model for coin tossing after the trivial modification of converting \(\{H, T\}\) into \(\{0,1\}\). This is also a rather appropriate model for designing a communication or storage facility, since all \(n\)-tuples are then equiprobable (in the model) for each \(n\), and thus the facilities need not rely on any special characteristics of the data. On the other hand, if one wants to compress the data, reducing the required number of transmitted or stored bits per incoming bit, then a more elaborate model is needed.

Developing such an improved model would require finding out more about where the data is coming from — a naive application of calculating relative frequencies of \(n\)-tuples would probably not be the best choice. On the other hand, there are well-known data compression schemes that in essence track dependencies in the data and use them for compression in a coordinated way. These schemes are called universal data-compression schemes since they don’t rely on a probability model. At the same time, they are best analyzed by looking at how they perform for various idealized probability models.

The point of this example is that choosing probability models often depends heavily on how the model is to be used. Models more complex than IID binary digits are usually based on what is known about the input processes. Measuring relative frequencies and Associating them with probabilities is the basic underlying conceptual connection between real-world and models, but in practice this is essentially the relationship of last resort. For most of the applications we will study, there is a long history of modeling to build on, with experiments as needed.

Example 1.6.3. Fable: In the year 2008, the financial structure of the USA failed and the world economy was brought to its knees. Much has been written about the role of greed on Wall Street and incompetence in Washington. Another aspect of the collapse, however, was a widespread faith in stochastic models for limiting risk. These models encouraged people to engage in investments that turned out to be far riskier than the models predicted. These models were created by some of the brightest PhD’s from the best universities, but they failed miserably because they modeled everyday events very well, but modeled the rare events and the interconnection of events poorly. They failed badly by not understanding their application, and in particular, by trying to extrapolate typical behavior when their primary goal was to protect against highly atypical situations. The moral of the fable is that brilliant analysis is not helpful when the modeling is poor; as computer engineers say, “garbage in, garbage out.”

The examples above show that the problems of modeling a real-world experiment are often connected with the question of creating a model for a set of experiments that are not exactly the same and do not necessarily correspond to the notion of independent repetitions within the model. In other words, the question is not only whether the probability model is reasonable for a single experiment, but also whether the IID repetition model is appropriate for multiple copies of the real-world experiment.

At least we have seen, however, that if a real-world experiment can be performed many times with a physical isolation between performances that is well modeled by the IID repetition model, then the relative frequencies of events in the real-world experiment correspond to relative frequencies in the idealized IID repetition model, which correspond to probabilities in the original model. In other words, under appropriate circumstances, the probabilities in a model become essentially observable over many repetitions.

We will see later that our emphasis on IID repetitions was done for simplicity. There are other models for repetitions of a basic model, such as Markov models, that we study later. These will also lead to relative frequencies approaching probabilities within the repetition model. Thus, for repeated real-world experiments that are well modeled by these repetition models, the real world relative frequencies approximate the probabilities in the model.

Statistical independence of real-world experiments

We have been discussing the use of relative frequencies of an event \(A\) in a repeated real world experiment to test \(\operatorname{Pr}\{A\}\) in a probability model of that experiment. This can be done essentially successfully if the repeated trials correpond to IID trials in the idealized experiment. However, the statement about IID trials in the idealized experiment is a state ment about probabilities in the extended \(n\)-trial model. Thus, just as we tested \(\operatorname{Pr}\{A\}\) by repeated real-world trials of a single experiment, we should be able to test \(\operatorname{Pr}\left\{A_{1}, \ldots, A_{n}\right\}\) in the \(n\)-repetition model by a much larger number of real-world repetitions of \(\n\)-tuples rather than single trials.

To be more specific, choose two large integers, \(m\) and \(n\), and perform the underlying real world experiment \(m n\) times. Partition the mn trials into \(m\) runs of \(n\) trials each. For any given \(n\)-tuple \(A_{1}, \ldots, A_{n}\) of successive events, find the relative frequency (over \(m\) trials of \(n\) tuples) of the n-tuple event \(A_{1}, \ldots, A_{n}\). This can then be used essentially to test the probability \(\operatorname{Pr}\left\{A_{1}, \ldots, A_{n}\right\}\) in the model for \(n\) IID trials. The individual event probabilities can also be tested, so the condition for independence can be tested.

The observant reader will note that there is a tacit assumption above that successive \(n\) tuples can be modeled as independent, so it seems that we are simply replacing a big problem with a bigger problem. This is not quite true, since if the trials are dependent with some given probability model for dependent trials, then this test for independence will essentially reject the independence hypothesis for large enough \(n\). In other words, we can not completely verify the correctness of an independence hypothesis for the n-trial model, although in principle we could eventually falsify it if it is false.

Choosing models for real-world experiments is primarily a subject for statistics, and we will not pursue it further except for brief discussions when treating particular application areas. The purpose here has been to treat a fundamental issue in probability theory. As stated before, probabilities are non-observables — they exist in the theory but are not directly measurable in real-world experiments. We have shown that probabilities essentially become observable in the real-world via relative frequencies over repeated trials.

Limitations of relative frequencies

Most real-world applications that are modeled by probability models have such a large sample space that it is impractical to conduct enough trials to choose probabilities from relative frequencies. Even a shuffed deck of 52 cards would require many more than \(52 ! \approx 8\times10^{67}\) trials for most of the outcomes to appear even once. Thus relative frequencies can be used to test the probability of given individual events of importance, but are usually impractical for choosing the entire model and even more impractical for choosing a model for repeated trials.

Since relative frequencies give us a concrete interpretation of what probability means, however, we can now rely on other approaches, such as symmetry, for modeling. From symmetry, for example, it is clear that all 52! possible arrangements of a card deck should be equiprobable after shuffing. This leads, for example, to the ability to calculate probabilities of different poker hands, etc., which are such popular exercises in elementary probability classes.

Another valuable modeling procedure is that of constructing a probability model where the possible outcomes are independently chosen \(n\)-tuples of outcomes in a simpler model. More generally, most of the random processes to be studied in this text are defined as various ways of combining simpler idealized experiments.

What is really happening as we look at modeling increasingly sophisticated systems and studying increasingly sophisticated models is that we are developing mathematical results for simple idealized models and relating those results to real-world results (such as relating idealized statistically independent trials to real-world independent trials). The association of relative frequencies to probabilities forms the basis for this, but is usually exercised only in the simplest cases.

The way one selects probability models of real-world experiments in practice is to use scientific knowledge and experience, plus simple experiments, to choose a reasonable model. The results from the model (such as the law of large numbers) are then used both to hypothesize results about the real-world experiment and to provisionally reject the model when further experiments show it to be highly questionable. Although the results about the model are mathematically precise, the corresponding results about the real-world are at best insightful hypotheses whose most important aspects must be validated in practice.

Subjective probability

There are many useful applications of probability theory to situations other than repeated trials of a given experiment. When designing a new system in which randomness (of the type used in probability models) is hypothesized, one would like to analyze the system before actually building it. In such cases, the real-world system does not exist, so indirect means must be used to construct a probability model. Often some sources of randomness, such as noise, can be modeled in the absence of the system. Often similar systems or simulation can be used to help understand the system and help in formulating appropriate probability models. However, the choice of probabilities is to a certain extent subjective.

Another type of situation, of which a canonic example is risk analysis for nuclear reactors, deals with a large number of very unlikely outcomes, each catastrophic in nature. Experimentation clearly cannot be used to establish probabilities, and it is not clear that probabilities have any real meaning here. It can be helpful, however, to choose a probability model on the basis of subjective beliefs which can be used as a basis for reasoning about the problem. When handled well, this can at least make the subjective biases clear, leading to a more rational approach to the problem. When handled poorly, it can hide the arbitrary nature of possibly poor decisions.

We will not discuss the various, often ingenious methods to choose subjective probabilities. The reason is that subjective beliefs should be based on intensive and long term exposure to the particular problem involved; discussing these problems in abstract probability terms weakens this link. We will focus instead on the analysis of idealized models. These can be used to provide insights for subjective models, and more refined and precise results for objective models.

³⁸We are not suggesting that distinguishing different coins for the sake of coin tossing is an important problem. Rather, we are illustrating that even in such a simple situation, the assumption of identically prepared experiments is questionable and the assumption of independent experiments is questionable. The extension to \(n\) repetitions of IID experiments is not necessarily a good model for coin tossing. In other words, one has to question both the original model and the \(n\)-repetition model.