20.1: Three Bivariate Scenarios

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$ $$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}[1]{\| #1 \|}$$ $$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}[1]{\| #1 \|}$$ $$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$

As we saw with univariate data in chapter 15, different kinds of plots and statistics are appropriate depending on the variable’s scale of measure – categorical or numeric. There are thus three different cases for bivariate analysis:

• Two categorical variables
• One categorical variable and one numeric variable
• Two numeric variables

We’ll consider each case in turn. Throughout all the remaining sections, we’ll use this fictitious data set, called people:

|        gender  salary  color   followers

| 0   male    54.94  purple         26

| 1  female   72.48  purple         22

| 2   male     9.47   blue          27

| 3  other    60.08   red           22

| 4   male    37.62   red           13

Each row represents one fictional person we interviewed, and includes their gender, their salary (in thousands of dollars per year), their favorite color, and the number of followers they have on some unspecified social media website.

The DataFrame has 5000 rows, and no special “index” variable: none of the columns that we collected are unique, so we just let Pandas default to indexing the rows by number, 0 through 4,999.

20.1: Three Bivariate Scenarios is shared under a CC BY-SA license and was authored, remixed, and/or curated by Stephen Davies (allthemath.org) .