Skip to main content
Engineering LibreTexts

10.2: Association and Causality

  • Page ID
    39265
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    All of the above questions can be answered with data. In future chapters, we’ll learn the exact Python commands to ask them, and how to interpret the answers.

    For now, I merely want to draw your attention to the fact that these are all questions of association, not causation. An association between variables merely means that they are correlated in some way statistically.1 If A = SMOKER goes with B = CANCER more often than A = NON-SMOKER does, then there is an association between the two, period. If yearly income B is on average higher for A = REPUBLICAN than for A = DEMOCRAT, then there is an association between the two, period.

    (By the way, a key nuance will turn out to be: how much more often does A = SMOKER need to go with B = CANCER in order for us to be confident that there is a true association? Or how much more wealthy do the A = REPUBLICANs need to be on average for us to have confidence we’ve identified a real link to political party? That one’s a little tricky, and we’ll postpone addressing it for now.)

    So anyway, the question of association turns out to be pretty straightforward to answer. Python will simply tell us if variables are associated or not. More difficult, however, is determining causality (a.k.a. causation). Does a person’s political affiliation influence how much wealth they have? Or is it the other way around: does a person’s wealth cause them to vote a certain way? Or is it neither of these, with some third factor (perhaps values, or life philosophy) helping determine both variables?

    If the first of these three is the case, we would write “A → B,” pronounced “A causes B”. If the second, we’d write, “B → A,” and for the third, we’d write “C → A, B” for some other (possibly yet to be determined) variable C. Determining which (if any) of these is true calls for some careful thinking, intuition, and additional kinds of statistical tests.

    In fact, just to blow your mind, Figure 10.3.1 gives a partial list of the various types of causation that could be the true explanation, once we find out that A and B have an association. As you can see, there are a lot of ways to go wrong. Only one of the possibilities is that “A actually causes B,” which is what we suspected in the first place. The others are all ways of producing that same association we picked up in the data.


    This page titled 10.2: Association and Causality is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Stephen Davies (allthemath.org) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.