Skip to main content
Engineering LibreTexts

20.1: Three Bivariate Scenarios

  • Page ID
    39341
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    As we saw with univariate data in chapter 15, different kinds of plots and statistics are appropriate depending on the variable’s scale of measure – categorical or numeric. There are thus three different cases for bivariate analysis:

    • Two categorical variables
    • One categorical variable and one numeric variable
    • Two numeric variables

    We’ll consider each case in turn. Throughout all the remaining sections, we’ll use this fictitious data set, called people:

    | gender salary color followers

    | 0 male 54.94 purple 26

    | 1 female 72.48 purple 22

    | 2 male 9.47 blue 27

    | 3 other 60.08 red 22

    | 4 male 37.62 red 13

    Each row represents one fictional person we interviewed, and includes their gender, their salary (in thousands of dollars per year), their favorite color, and the number of followers they have on some unspecified social media website.

    The DataFrame has 5000 rows, and no special “index” variable: none of the columns that we collected are unique, so we just let Pandas default to indexing the rows by number, 0 through 4,999.


    This page titled 20.1: Three Bivariate Scenarios is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Stephen Davies (allthemath.org) via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.