Skip to main content
Engineering LibreTexts

20.6: Pearson Correlation Coefficient

  • Page ID
    39346
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    The test we’ll use to see whether this pattern is significant is the Pearson correlation coefficient (also called “Pearson’s r”). To run it, we call SciPy’s pearsonr() function and pass it the two columns:

    Code \(\PageIndex{1}\) (Python):

    scipy.stats.pearsonr(people.salary, people.followers)

    | (0.2007815176819964, 1.2285885030618397e-46)

    We’re given two numbers as output. The second of these is the p-value, and remembering our pitfall from above, we’re savvy enough to notice the e-46 at the end and declare it significant. So we can say we have high confidence that a person’s salary is associated with their number of social media followers.

    Now for the first number, which is the actual “correlation coefficient.” If the second number is below α and therefore significant (as it was here), you then look at the first number and see whether it’s positive or negative. Positive numbers indicate positive correlations: an increase in one of the variables corresponds to an increase in the other. Negative numbers indicate negative correlations: an increase in one of the variables corresponds to a decrease in the other. Here, we have a positive number, which means that having more followers tends to go with a higher salary.

    As an example of the second (negative) case, suppose two of our variables in a data set of sailboat races were length (the length of the sailboat, from bow to stern) and finish_time (the number of minutes the boat took to complete the race). We’re likely to see a negative correlation in this case, because physics tells us that longer boats can travel through the water faster (and therefore have lower finish times). These two variables would thus be correlated, but in a negative way: a high value for one would typically indicate a low value for the other.


    This page titled 20.6: Pearson Correlation Coefficient is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Stephen Davies (allthemath.org) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.

    • Was this article helpful?