5.5: Averages

Last updated
Save as PDF

Page ID: 50186

Paul Penfield, Jr.
Massachusetts Institute of Technology via MIT OpenCourseWare

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Suppose we are interested in knowing how tall the freshman selected in our example is. If we know who is selected, we could easily discover his or her height (assuming the height of each freshmen is available in some data base). But what if we have not learned the identity of the person selected? Can we still estimate the height?

At first it is tempting to say we know nothing about the height since we do not know who is selected. But this is clearly not true, since experience indicates that the vast majority of freshmen have heights between 60 inches (5 feet) and 78 inches (6 feet 6 inches), so we might feel safe in estimating the height at, say, 70 inches. At least we would not estimate the height as 82 inches.

With probability we can be more precise and calculate an estimate of the height without knowing the selection. And the formula we use for this calculation will continue to work after we learn the actual selection and adjust the probabilities accordingly.

Suppose we have a partition with events \(A_i\) each of which has some value for an attribute like height, say \(h_i\). Then the average value (also called the expected value) \(H_{av}\) of this attribute would be found from the probabilities associated with each of these events as

\(H_{av} = \displaystyle \sum_{i} p(A_i)h_i \tag{5.9}\)

where the sum is over the partition.

This sort of formula can be used to find averages of many properties, such as SAT scores, weight, age, or net wealth. It is not appropriate for properties that are not numerical, such as gender, eye color, personality, or intended scholastic major.

Note that this definition of average covers the case where each event in the partition has a value for the attribute like height. This would be true for the height of freshmen only for the fundamental partition. We would like a similar way of calculating averages for other partitions, for example the partition of men and women. The problem is that not all men have the same height, so it is not clear what to use for \(h_i\) in Equation 5.9.

The solution is to define an average height of men in terms of a finer grained partition such as the fundamental partition. Bayes’ Theorem is useful in this regard. Note that the probability that freshman \(i\) is chosen given the choice is known to be a man is

\(p(A_i \; | \; M) = \dfrac{p(A_i)p(M \; | \; A_i)}{p(M)} \tag{5.10}\)

where \(p(M \;|\; A_i)\) is particularly simple—it is either 1 or 0 depending on whether freshman \(i\) is a man or a woman. Then the average height of male freshmen is

\(H_{av}(M) = \displaystyle \sum_{i} p(A_i \; | \; M)h_i \tag{5.11}\)

and similarly for the women,

\(H_{av}(W) = \displaystyle \sum_{i} p(A_i \; | \; W)h_i \tag{5.12}\)

Then the average height of all freshmen is given by a formula exactly like Equation 5.9:

\(H_{av} = p(M)H_{av}(M) + p(W)H_{av}(W) \tag{5.13} \)

These formulas for averages are valid if all \(p(A_i)\) for the partition in question are equal (e.g., if a freshman is chosen “at random”). But they are more general—they are also valid for any probability distribution \(p(A_i)\).

The only thing to watch out for is the case where one of the events has probability equal to zero, e.g., if you wanted the average height of freshmen from Nevada and there didn’t happen to be any.