6.1: The Four Scales of Measure

Last updated
Save as PDF

Page ID: 39369

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Every variable¹ we collect can have various values, and the nature of information it contains can be described by its scale of measure. There are four such scales of measure² , and each one determines which kinds of operations are “legal” (i.e., sensible) with that variable.

Categorical/Nominal

The first kind is the simplest, although it actually has two different names in common use: they’re called both categorical variables and nominal variables. These variables represent one of a set of predefined choices, where no choice is “higher” or “greater” than any other.

An example would be a fave_color variable that holds the value of a child’s favorite color: legal values are "red", "blue", "green" or "yellow". We know it’s categorical from, among other things, the fact that there’s no one right way to order those values. (Alphabetical, most-popular-first, and ordering according to the sequence of the rainbow are three possibilities. You might think of others.)

Political affiliation would be another categorical variable. Its values (like "Democrat", "Republican", and "Green") aren’t in any particular order. (Although you might think of the traditional leftto-right political spectrum, that’s only one dimension of political party, and perhaps not even the most important one.) Other examples include a film’s genre, a student’s nationality, and a football player’s position.

Now you might be tempted to think, “hmm...all the categorical examples so far are textual, not numeric. Perhaps this scales of measure thing is just another way of stating the variable type?” Alas, no. For one, we’ll see text variables in the next category as well. For another, even data that on its surface seems numeric can actually be categorical in disguise.

Consider the uniform number of an athlete. I might be interested in asking, “which uniform number had the greatest professional athletes who chose it?” #24 is a good candidate: Willie Mays, Ken Griffey Jr., and Kobe Bryant all wore that jersey number. Or maybe #7 is the winner, with Mickey Mantle, John Elway, and Cristiano Ronaldo. Either way, though, all that matters in this analysis is which uniform number an athlete chose, not how high that number is compared to others. No one in their right mind would say that Peyton Manning (#18) was “twice the player” Mia Hamm was (#9), because uniform numbers aren’t really numbers at all: they’re more like labels.

Legal Operations for Categorical/Nominal Variables

When a variable is on a categorical scale, about the only things you can do are compare for equality/inequality, count the occurrences of different values, and compute something called the mode of the values.

The mode simply means the value that occurs the most often. It’s the first of the “measures of central tendency” we’ll see: such measures are a way of capturing something about the “typical” value of a variable. For categorical variables, the only typical-ness is “which one occurs the most often?” If we ask a bunch of people for their fave_color, and we get the answers "blue", "red", "blue", "blue", and "yellow", then the mode is "blue". It’s that simple.

To wrap things up, these things make sense to ask of a categorical variable:

“Is his favorite color the same as her favorite color?”
“How many people have "red" as their favorite color?”
“What’s the most popular favorite color?”

while these do not:

“Is his favorite color greater than her favorite color?” (??)
“What’s Caitlin’s favorite color minus Hannah’s?” (??)
“What’s the ‘average’ favorite color in this data set?” (??)

Ordinal

One step up on the food chain is an ordinal variable, which means that its different possible values do have some meaningful order.

Consider education_level, a variable that contains the highest degree a survey respondent has earned. Its values can be any of the following: "HS", "Associates", "Bachelors", "Masters", and "PhD". In some ways, this is like fave_color: the variable must take on one of a set of specific, prescribed values. However, it’s pretty clear that a High School degree is closer to (more similar to) an Associates degree than it is to a Ph.D. Each successive value represents more education, and so unlike categorical variables, it does make sense to compare them along greater-than-or-less-than lines.

In addition to the mode, another measure of central tendency available for ordinal variables is the median. I think of the median as the “middlest” value: if you line up all the occurrences in a row – in order of the values – it’s the one that lies in the exact middle. Suppose our survey respondents give these answers: "Bachelors", "HS", "HS", "Masters", "Masters", "Bachelors", and "HS". To compute the median, we line them all up in order:

"HS" "HS" "HS" "Bachelors" "Bachelors" "Masters" "Masters"

and find the middlest one, which is "Bachelors". So "HS" is the mode of this variable, and "Bachelors" is the median.

Other examples of ordinal variables include an NCAA basketball team’s top-25 ranking, a taxpayer’s tax bracket, and survey questions asking whether you "strongly disagree", "disagree", are "neutral", "agree", or "strongly agree" with a certain statement.

Again, a list of do’s and don’t’s. For ordinal variables, these are okay:

“Is his education level the same as her education level?”
“How many people answered "strongly disagree" to this question?”
“Is UMW basketball ranked higher or lower than Messiah?”
“What’s the median tax bracket for this group of employees?”

while these are not:

“Which looks like the bigger mismatch on paper: Duke v. Kentucky, or Villanova v. Gonzaga?” (??)
“What’s Caitlin’s education level minus Hannah’s?” (??)
“What’s the ‘average’ tax bracket for this group of employees?”

It’s worth commenting on that second list, because you might have thought some of those items were completely reasonable. For example, suppose that in the latest AP poll, Duke is currently ranked #1, Kentucky #3, Villanova #4, and Gonzaga #23. You might think that clearly the Villanova/Gonzaga matchup is the most lopsided, since there’s nineteen places between them, whereas Duke and Kentucky are separated by just two.

But not necessarily. We know Duke is considered stronger than Kentucky, but not how much stronger. It is almost certainly not the case that the teams are exactly evenly spaced all the way down the list from #1 to #25. Real life doesn’t work like that. Instead, it might be the case that Duke and Georgetown, the #1 and #2 teams in the country, are considered far and away the best two teams. And perhaps the next five or even twenty teams on the list are considered very close, to the point where experts disagree wildly on what order they should be in. If this is the case, then mighty Duke vs. (comparatively) lowly Kentucky might be an enormous mismatch, while Villanova and Gonzaga might be considered a tossup.

The bottom line is: although an ordinal variable’s values are ordered, there is no information at all about the spacing between them. I’ll tell you from personal experience that the difference between a Bachelors and a Masters degree is nuthin’ compared to that between a Masters and a Ph.D. (You can ask anyone who has earned the latter for confirmation.)

This leads into the second item on the no-no list: subtracting two ordinal values. All you’re going to get is “the number of positions in the sequence by which they differ,” which tells you next to nothing. If I ask people to rate a movie on a scale of "POOR", "FAIR", "GOOD", and "EXCELLENT", the difference between "POOR" and "GOOD" is likely to be a lot greater than that between "FAIR" and "EXCELLENT". This is true even though the “difference” between them seems exactly the same: two ranking’s worth. The fact is that humans don’t interpret those four adjectives as exactly equally spaced, and therefore it’s a fallacy to interpret their results as though they did.

Which leads to the third and last item: trying to take the “average” (adding up all the scores and dividing by the total). It’s tempting to say, “let’s treat "POOR" as a 1, "FAIR" as a 2, "GOOD" as a 3, and "EXCELLENT" as a 4. Then, we can just take the mean of all the results to get the average rating! What’s not to like?” Here’s what’s not to like. By assigning those numbers, you added spurious information and thereby twisted the respondent’s meaning into something they didn’t necessarily intend. They very likely didn’t think of the four options as equally-spaced numerically, and so this average is quite bogus. Instead, take the median.

Interval

Onward. Our next scale of measure is the interval scale, which fulfills what was missing with ordinal variables. An interval variable does have meaningful and reliable differences between values, which can be computed and analyzed.

Unlike the previous two scales, interval variables are always numeric by nature. You can’t subtract two words from one another, but you can do so with numbers, and unlike our uniform number and NCAA hoops ranking examples, that subtraction is a meaningful operation.

An example of an interval variable might be the longitude (or latitude) of a city. Not only can we ask whether two cities have the same longitude (as with categorical), and whether one is east or west of another (as with ordinal), we can now ask how far east. Subtract one longitude from the other, and boom. We have a reliable degree of difference.

This allows us to ask questions like “are Dallas and Fort Worth farther apart than Minneapolis and St. Paul are?” or “is the temperature swing between daytime and nighttime wider in Colorado than in Virginia?” (Hint: yes.) Note that we couldn’t legally ask such questions of an ordinal variable, since there was no way to really know how large the difference between "GOOD" and "EXCELLENT" was, as opposed to that between "FAIR" and "GOOD".

Another example of an interval scale variable, besides the aforementioned temperature, is the year an event takes place. We can say, for example, that nearly two-thirds of our nation’s history has occurred after the Civil War (2021−1865 = 156 years, versus 1861−1776 = 85 years).

The quintessential measure of central tendency for interval scale is the arithmetic mean. Both the median and the mode are still permitted, and they are sometimes quite useful. But often we’re going to fall back on the add-’em-up-and-divide-by-the-number-ofelements thing you learned in grade school. In this case, it makes sense, because the values are at fixed, meaningful, numerical positions and so adding them up is okay.

Here’s our list of goods (for interval scale variables):

“Was today’s high temperature the same as yesterday’s?”
“Was Beethoven born before or after Napoleon?”
“How many cities are at 40° latitude?”
“What’s the median year of birth for current U.S. Senators?”
“Which is experiencing more global warming (temperature difference) – Greenland or France?”
“What’s London’s latitude minus Boston’s? How much farther north is it?”
“What was the average high temperature in Fredericksburg in September?”

and bads:

“Which cities are at least 20% more east than Chicago?” (??)
“When was the first fall day which was half as hot as it was on July 4th?” (??)
“Was Lincoln born 5% later than Washington?” (??)

Let’s consider that bads list. With an interval scale variable, we can ask almost anything we want to about it. Almost. The one fly in the ointment is questions that have phrases like “twice as” or “10% less than.” Those, we cannot do. The reason is that an interval scale variable has no meaningful zero point.

In an interval scale, values have relative distances from each other, but not absolute differences from some fixed reference point. Consider years. Saying that the Cubs finally won the World Series 146 years after their franchise was born is meaningful: the difference between 1870 and 2016 can be measured. But what if we said “they won the World Series 7.8% later than their franchise was born”? Could such a sentence possibly say anything useful?

The answer is no, and here’s why. The “zero point” of our calendar system is arbitrary. By that I mean that the year we might consider “year zero” has nothing to do with the Cubs or baseball or America or anything else: it was a guess as to the birth year of Jesus Christ, and a wrong one at that.³

We could, of course, have chosen to measure time relative to any other point instead, like the birth of our own nation, the founding of Rome, the Cubs franchise being founded, or anything else. If we had done that, all of the relative differences between years would have been the same: there would still have been 85 years between the Declaration of Independence and the Civil War, Barack Obama would still have been President for 8 years, and you would still be the same age. But all the absolute calculations that implicitly make reference to the zero point – like “what percent later did the Cubs won the Series than their franchise began?” would suddenly become radically different. If we measured years relative to 1776, then the Cubs’ victory would have been “155.3% later” than their origin, instead of “7.8% later!” That betrays the fact that this is an utterly meaningless calculation.

Same thing with longitude. While latitude plausibly has a meaningful zero point – the equator – and thus perhaps “twice as north” has some meaning to it (“twice as far from the center of the planet”) longitude clearly does not. Saying a city is “twice as east” as another is plain nonsense. That’s because the zero point for longitude is arbitrary: it’s set at the Greenwich, England, of all things. Clearly only relative differences between longitude have any meaning.

And the same thing with temperature. If yesterday’s high was 40°, and today’s is 80°, it’s tempting to say “whew! It’s twice as hot today!” To see that this is gibberish, though, consider what would happen if we changed to use the metric system like the rest of the civilized world does, and measured temperature in Celsius. Now if we did that, clearly we wouldn’t start experiencing heat waves or cold spells as a result! Hey we’re just changing our units, bro, not influencing the atmosphere. But realize that in Celsius, yesterday’s 40°F day would become 4.4°C, and today’s 80°F would be 26.7°C. So now, by changing our units, we would have to say “oh golly, I guess it’s actually over six times as hot today!” This is why multiplying and dividing with interval scale variables leads to madness.

Ratio

Which brings us to our last of the four scales: the ratio scale. In some ways this is the easiest to understand, because of all the mathematical questions we might want to ask, we can ask them. Multiply, divide, make absolute statements like “25% greater than” – go crazy, man.

Salary has a meaningful, absolute zero point: namely, an unemployed (or volunteer) worker earning zero dollars. Since we have that non-arbitrary standard, it makes perfect sense to say things like “he makes twice as much as she does.

The height of a person has a meaningful zero point as well: the ground. If Tyrion Lannister rises 3 1 2 feet from the floor, and Gregor Clegane stands a full 7 feet from that same floor, it makes all the sense in the world to say “Gregor is twice as tall as Tyrion.”

As with interval scale variables, we often use the arithmetic mean as our measure of central tendency.⁴

¹Note that our use of the term variable in this chapter is different than how we used it in chapter 3 (e.g., p. 14) and throughout chapter 5. In this chapter, a variable is normally some measurable aspect of every object in our study. We might recruit participants to a research experiment, and record their race, weight, and favorite breakfast cereal. These would be our three variables. Each of the three will constitute many values, since our group of participants will have many races, weights, and cereals. In programming terms, they will eventually become aggregate data types of some kind.

²According to psychologist Stanley Smith Stevens in 1946. Other researchers have developed related, but different, scales of measure.

³Later historical discoveries have demonstrated that Herod the Great died in what we now call 4 B.C. If you went to Sunday School, you might recall that in a fit of jealousy, King Herod the Great ordered all the baby boys in Bethlehem (two years old or younger) to be killed. (See Matthew 2:13-18.) He chose “two years or younger” as the cutoff because his goal was to kill Jesus, who was about two years old at the time. Hence Jesus was most likely born in the year which we have (incorrectly, it turns out) labeled as “6 B.C.” Fun facts

⁴Interestingly, there are actually two different kinds of means, one of which, called the “geometric mean” is only applicable on the ratio data scale. It involves multiplying and taking roots instead of adding and dividing, and is a useful operation in some niche contexts.