As we saw with univariate data in chapter 15, different kinds of plots and statistics are appropriate depending on the variable’s scale of measure – categorical or numeric. There are thus three different cases for bivariate analysis:
- Two categorical variables
- One categorical variable and one numeric variable
- Two numeric variables
We’ll consider each case in turn. Throughout all the remaining sections, we’ll use this fictitious data set, called people:
| gender salary color followers
| 0 male 54.94 purple 26
| 1 female 72.48 purple 22
| 2 male 9.47 blue 27
| 3 other 60.08 red 22
| 4 male 37.62 red 13
Each row represents one fictional person we interviewed, and includes their gender, their salary (in thousands of dollars per year), their favorite color, and the number of followers they have on some unspecified social media website.
The DataFrame has 5000 rows, and no special “index” variable: none of the columns that we collected are unique, so we just let Pandas default to indexing the rows by number, 0 through 4,999.