6: Analysis

Analysis is a way of interpreting what is going on in the maps that you encounter and create. Analytical tools provide ways of engaging with data, understanding spatial patterns, and giving us a vocabulary for discussing what we see when we look at a map. There are many ways to spatially analyze the data displayed in maps – too many to mention here. In this chapter, we will focus on a few particular techniques for analyzing maps, and we will touch on some of the social, economic, and political implications of map analysis.

This chapter will introduce you to four kinds of analysis:

• point pattern
• autocorrelation
• proximity
• correlation

These categories differ in several ways. They can differ in whether they are looking at location alone, or location and attributes at the same time. They sometimes differ in whether they look at points and areas, or just points or areas. These analytical approaches also differ in whether they are looking at just one theme (say, just population) or more than one theme at a time (such as two maps of counties, one of population density per county and another of median income).

Depending on the focus of inquiry and the number of themes being analyzed, some maps can be analyzed using more than one of these methods, and other maps are best analyzed using just one. In this chapter, we will teach you the differences between these four types of spatial analysis, and ask you to use these analytical methods to understand maps. Keep in mind that although we draw distinctions between these types of analysis throughout the chapter, there are many overlaps and situations in which different analytical methods (particularly proximity and correlation) can be used in tandem. At the end of this chapter, you will have the basic skills to analyze and interpret maps and spatial data.

Analysis. Four common methods for analyzing maps.They differ in whether they are looking at location alone, or location and attributes at the same time. [1]

6.1 Point Pattern Analysis

Point pattern analysis looks at the spatial arrangement of the locations of objects or events within a single theme and does not consider how their attributes vary. In particular, this kind of analysis looks at the relationship between the locations of objects or events in space relative to the locations of other objects or events.

The map below looks at the distribution of burglaries near the Thames River in London. Here we are looking at the locations of specific events—burglaries—which occur when someone enters a building illegally with the intent to steal something. Note that technically there is a qualitative attribute being considered – did a burglary occur at a given location or not? – but we are really just interested in the location of these events. We are not interested in the attributes of any given burglary itself, such as what was stolen, what was it worth, whether the thief caught, or any number of other kinds of attributes we could measure or questions we could ask.

London burglaries. This map portrays the distribution of burglaries near the Thames River in London. [2]

We use point pattern analysis to describe the pattern of this one particular theme of interest—locations of burglaries—over the mapped area. Point pattern analysis can help us to see where spatial patterns of burglaries are occurring, such as if burglars are targeting a particular block in recent days. As you would guess from the name, point pattern analysis is interested in finding patterns in locations, in this case where there are any patterns in burglaries. We need a language to describe these patterns, which is what we explore next.

There are three main types of point patterns – or spatial distributions of locations of entities or events – in a map: random, uniform, and clustered.

Point patterns. Three general point patterns are random, uniform, and clustered. [3]

Random. A random pattern is where locations are distributed seemingly randomly, or in other words, where the position of any one point is unrelated to the locations of other points. Marketing firms that conduct phone surveys often want a random distribution of people, for example, so they use methods to ensure that they choose random locations where people live in a city, state, or nation.

Uniform. A uniform pattern is one in which locations are evenly distributed through space. Maps of fire stations in a county often display a uniform pattern because fire stations are deliberately spaced out across a city or county in order to ensure that firefighters can quickly and efficiently access fires across the area. Another example is the location of wolf packs, in that packs spread out as much as possible as each pack tries to keep a lot of space between itself and the other packs in order to reduce conflict over game.

Clustered. A clustered pattern describes when a number of locations are very close to one another, or in clusters – closer than you would expect if they were randomly patterned. Burglars who target a particular neighborhood will create a cluster of burglaries on a community’s crime map. Disease is often clustered in space because the location of one event, such as the flu, makes it more likely that other flu cases will be nearby, as the flu is spread through close contact among people.

Keep in mind that you will often find multiple point patterns on the same map. The figure below shows a map of locations of hardware stores in the Midwest of the United States. Looking at this map with point pattern analysis, you could describe store distribution as uniform in northern Iowa (circled in red), random in central Wisconsin and the Minnesota/Dakotas border area (yellow), and clustered around Milwaukee and the Twin Cities (blue).

Clustering in stores. Clustering comparison for the locations of Menards hardware stores in the US Midwest. You could describe store distribution as uniform in northern Iowa (circled in red), random in central Wisconsin and the Minnesota/Dakotas border area (yellow), and clustered around Milwaukee and the Twin Cities of Minnesota (blue). [4]

6.2 Autocorrelation Analysis

While point pattern analysis is concerned with the relationships among locations on the map, autocorrelation pertain to both the spatial distribution of location and attributes over an area. census data, for example, are well-suited for autocorrelation analysis. Though census data may be collected at the level of individual households, the demographic data are ultimately aggregated and mapped over an area, rather than tied to specific household locations. Autocorrelation looks at the relationship of one attribute to itself; or in other words, autocorrelation is a way of analyzing the degree to which things of the same kind are related.

Recall the map of London burglary locations discussed above. The figure below demonstrates the difference between a point map (like that above), which is best examined using point pattern analysis, and an area map, which is better suited to an autocorrelation analysis. The point map above shows specific locations where burglaries were reported in London while the one below reports these data as burglary rates for specific neighborhoods, which allows us to compare burglaries among neighboring areas. These two types of maps are useful for different purposes. If you want to understand the particular houses or blocks that burglars target in a neighborhood, the point map is better for gleaning information about the spatial clustering of burglaries. If you work for the city of London and you are trying to decide how to distribute resources to various police precincts, it is more helpful to you to know where the most crime is happening across different precinct areas. In that case, knowing the locations of specific households would not be as useful as having spatial data over an area. If you are considering buying a house in the neighborhood, either of these maps may help you to understand your general risk of burglary.

Burglaries by area. Reported burglaries in London aggregated over an area. [5]

There are three ways to describe autocorrelation patterns: negative autocorrelation, positive autocorrelation, or no autocorrelation. These descriptive terms call to mind what has been termed Tobler’s First Law of Geography: “Everything is related to everything else, but near things are more related than distant things.” The figure below offers a highly simplified example of how these autocorrelation descriptions might appear on a very stylized map.

Autocorrelation. Negative, positive, and neutral or no autocorrelation. [6]

• Negative autocorrelation describes a pattern that defies Tobler’s Law – the attribute is uniformly distributed across the area, intersects uniformly with dissimilar attributes, and is not concentrated.
• Positive autocorrelation corresponds with Tobler’s law – the areas nearest to each other will display similar patterns or densities of the attribute, and the areas farther away display different densities of the attributes.
• No autocorrelation indicates that there is no discernible pattern in the distribution of the attribute.

Another way of thinking about autocorrelation is to ask whether the values of an attribute in one place are likely to be similar to those in nearby places (positive autocorrelation), very different from neighboring locations (negative autocorrelation), and whether there is basically no connection between neighboring places in terms of the attribute (no autocorrelation).

Much of the demographic data that we deal with in this course will display positive autocorrelation. For instance, a map that shows the burglary rates for all of London demonstrates that boroughs with high burglary rates are generally located nearer to other boroughs with high or above average burglary rates. Boroughs with low burglary rates are generally nearer to other boroughs of low or below average burglary rates.

Burglaries by borough. London burglary rates aggregated by borough. Those with high burglary rates are generally located nearer to other boroughs with high or above average burglary rates. [7]

6.3 Proximity Analysis

Proximity analysis describes the spatial relationships and patterns between locations across two themes – think of it as point pattern analysis with two different kinds of objects or events. Using proximity analysis, you can look at the relationship between houses and streets, crimes and surveillance cameras, patients and disease vectors, or stores and where people live. Beyond the spatial relationship between multiple points, proximity analysis can help us to make sense of the world over time and distance.

Proximity analysis can be tremendously useful for public health—determining how diseases spread, how to predict vulnerability to disease, and how and where to most effectively target interventions. Dr. John Snow developed one of the classic examples of public health proximity analyses. As noted in Chapter 1, a cholera outbreak had ravaged London in 1854 and left many public health advocates and policy makers alarmed and unsure of how to contain the virus. Dr. Snow interviewed residents and discovered that those who contracted cholera obtained their water from the Broad Street pump, as in the figure below. Soon after the handle was removed from that pump, the cholera epidemic subsided.

Proximity and disease. Cholera outbreak and proximity to Broad Street water pump, London 1854, drawn by Dr. John Snow (pictured). The small black dots represent cholera cases, the large green dots represent water pumps, and the red dot is the Broad Street Water Pump. The red circle is the concentration of the greatest number of cholera cases, proximate to the Broad Street pump. [8]

To persuade the medical community that cholera was a waterborne rather than an airborne disease, Dr. Snow created the map shown above to demonstrate the relationship between cholera cases and the Broad Street pump. Dr. Snow’s map was one of the earliest examples of proximity analysis conducted to understand spatial disease vectors, and it paved the way for significant expansions of disease mapping. This information led to the reform of the entire public health system in Victorian England, which included the Nuisances Removal and Diseases Prevention Act of 1846, and the expansion of the London sewer system in 1849.

Proximity analysis can also help us to think about different ways of measuring distance. Most of the time we are interested in Euclidean distance, which is the straight-line distance between two points. However, unless you are able to scale and hop across buildings, the distance between point A and point B will be affected by the natural and built environment. Imagine that you are an ambulance driver, and you need to get a critically injured person to the nearest hospital. The “closest” hospital based on time (speed of the roads, traffic) or distance (miles of road) might not be the same as the hospital that is closest in Euclidean distance. Manhattan distance, or “taxicab geometry,” is a measurement of distance that takes into account the grid like pattern of city streets (as on the island of Manhattan), and is better suited to understanding navigational proximity. Manhattan distance is not only useful for thinking about proximity in urban environments. It more generally denotes the distance you have to travel over a transportation network in order to reach someplace, or network distance.

Imagine that you are planning a trip to the beautiful Southern Alps of New Zealand. Fox Glacier and Mt. Cook are two of the most breathtaking sites, and only about 35 km from one another, which is very proximate using a Euclidean measure of distance. Still, to travel between them by car takes over 5 hours because there are few places to pass through the mountains. Your travel plans must take into account network distance, or you will be in for a very long and potentially frustrating day of travel!

Kinds of distance. Google Maps driving directions from Mt. Cook to the Fox Glacier, New Zealand. Travel between them by car takes over 5 hours even though they are close in terms of Euclidian distance. [9]

6.4 Correlation Analysis

Correlation analysis involves analyzing the spatial relationship between multiple attributes or themes. In other words, correlation analysis attempts to measure the degree or extent to which two or more different attributes are spatially related. Although correlation is generally a good method for looking at multiple attributes aggregated over an area, it can also be used to talk about the relationship between an aggregated attribute and a specific point. In this way, sometimes there are overlaps between proximity and correlation analysis. We will deal with these overlaps more comprehensively later. For now, let’s look at a typical example of correlation analysis: the spatial relationship between income and education.

Across the world, many governments, NGOs, and media outlets herald the relationship between higher education and increased income. The figure below shows average incomes of those earning $75,000 or more and the percentage of people with a bachelor’s degree or higher for census tracts in the city of Los Angeles. Looking at these two maps side by side, we can see a general correlation between certain areas with relatively high incomes and higher levels of educational attainment. The places where the correlations between the two attributes are strong are highlighted with a circle or oval. Correlation analysis. Income and educational attainment are correlated in the Los Angeles Area. [10] There are other areas on this map where the correlation is not so clear, as noted by the squares and rectangles. For instance, in downtown Los Angeles (the square to the upper left of the circle in roughly the center of the map), we see little correlation between education and income. It’s important to read your maps carefully when assessing correlation: take a look at the long diagonal census tract in the upper right of each map. This is Los Angeles County Census Tract 9301.01, and we can see from the maps that though there is a large population of educated residents, there is insufficient data as to how many of them are making$75,000 or more per year. This may be because only 119 people live in this whole area of the San Gabriel mountains. Although a number of areas on the map seem to support the proposition that higher education correlates to higher incomes, the map also demonstrates that there are areas that do not adhere to this pattern, and that the situation is likely more complicated in reality.

6.4.1 Potential mistakes

When performing a correlation analysis, you need to be careful to avoid two common pitfalls: 1) correlation does not necessarily mean causation, and 2) data are sometimes not interoperable.

Mistake One: Correlation ≠ Causation

As with the income and education example above, just because you see a correlation in the map does not mean that you have sufficient information to determine causation. Looking back at our maps of income and education in Los Angeles, we do not have adequate information to claim that higher education causes higher incomes, or that higher incomes cause higher education. All we can see from the map is that the two are correlated. If you wish to make an assertion about causation when doing a spatial correlation analysis, you must consult and cite other robust academic literature that supports your analysis. In short, you must develop a candidate theory or concept that explains the relation among your variables.

The correlation/causation fallacy is perpetuated throughout the popular media. For instance, in 2014, the New York Times Economy section posted an article with the headline: “A Simple Equation: More Education = More Income.” (Porter 2014). Now this proposition may be true in certain areas, and at certain levels of aggregation, but we know even from a simple glance at our map that in the Los Angeles area, there are almost as many places where there is little correlation between income and education. We simply do not have enough information to understand why some tracts do not display a correlation between education and income when other nearby tracts do. We cannot make a claim about causation for certain places or spatial scales without introducing additional peer-reviewed data, and even then we must be very careful about causal assertions.

Similarly, the proposition in the New York Times article, which argues that higher education does have a causal relationship to higher income, does not help us to explain these low correlation tracts either. Though there may be widespread correlations between educational attainment and income at the state and county level, a critical reading of our map of income and education demonstrates complexity and the fallacy of a simple correlation equals causation argument. For these reasons, academic literature is required to clearly state its research methodology and is more transparent about how data are collected and how conclusions are drawn than popular media sources. Generally, academic resources are more useful if you are interested in making causal arguments.

Mistake Two: Interoperability oversights

Make sure that the data you are correlating are actually comparable. You’ll want to verify that the maps you are comparing—and the data that are displayed—are based on similar aggregation units, categories, and temporalities. You can only draw effective correlations if your maps and data are interoperable. The figure below shows an example of how correlation analysis can be used to interpret attribute shifts over time. These figures, produced by the New York Times, look at correlations in demographic shifts between black and Hispanic populations in South Los Angeles between 1990 and 2010.

Correlation and interoperability. Spatially interoperable maps that show correlations in demographic shifts over time. [11]

Using what you know about census data, take a look at the temporalities, aggregation units, spatial extent, and attribute categories. Are these maps properly interoperable? In this case, the answer is yes. These maps cover the same time frames (1990 & 2010), in years when racial categories remained consistent (Black and Hispanic have the same meanings in the 1990 and 2010 censuses), both maps aggregate data by census tract (across years when the spatial boundaries of census tracts remained consistent), and focus on the same spatial extent (South Central Los Angeles). These are maps that cover all of the interoperability bases and can be effectively analyzed using correlation. If you have additional questions about interoperability, refer back to the chapter on Data for a more in-depth discussion.

6.5 Combining Analyses

As mentioned above, there are occasionally overlaps between these different analytical methods, and the distinctions are not always so clear. Sometimes you can use these multiple methods to analyze one map. The figure below was published by the New York Times in 2012 as part of a series on fatal police shootings in Anaheim, California.

Mixing kinds of analysis. Locations of fatal police shootings, percent Hispanic demographic data, and Median household income, Anaheim, California, United States. [12]

Using this figure, we can perform a point pattern analysis by observing the distribution of fatal police shooting sites. We could conduct an autocorrelation analysis on the top map by looking at the relationship between the density of Hispanic residents across neighboring tracts (relatively positive levels of autocorrelation), or the bottom map by comparing median household incomes across tracts. We can perform a correlation analysis on the two maps side by side, by attempting to determine whether there is a relationship between the percentage of Hispanic residents and the median household income. Finally, we might use these figures to understand the proximity between fatal shooting locations, percentage of Hispanic residents, and/or median household incomes. Because there is often overlap between correlation and proximity, you could use either or both analytical methods to understand the spatial relationships between fatal shootings and the aggregated attributes of percent Hispanic population and median household income.

Let’s look at another example. In the figure below, proximity analysis is called for, since the focus of inquiry is the location of universities and Fortune 500 companies to one another in the Los Angeles area. In general, we see a relatively high degree of proximity between universities and Fortune 500 companies. Of course, there are some exceptions. For instance, Pepperdine University, nestled in the Santa Monica Mountains on the left side of your map, seems relatively isolated at this scale; however, in Manhattan or network distance terms, it is less than 20 miles, and around a 30-minute drive, from either Dole Foods or Health Net, the corporations located respectively to the northwest and northeast of campus.

Proximity analysis. Proximity analysis between corporations and universities in the Los Angeles area. [13]

6.6 Conclusion

In this chapter, we examined a few methods for analyzing maps. We have narrowed our focus to four general categories of analysis: point pattern, autocorrelation, proximity, and correlation. These categories differ in key ways, particularly in terms of whether they look at location alone, or location and attributes at the same time, and whether they are looking at just one theme or more than one theme at a time. They also sometimes differ in whether they look at only points, only areas, or both points and areas. Regardless of approach, it is important to not lose sight of the bigger picture, to remember that you can sometimes use multiple forms of analysis with the same map, and to remain critical of causal claims based only upon correlation.

Resources

1. CC BY-NC-SA 4.0. Steven M. Manson, 2015
2. Map generated with map interface at The London Telegraph (2015)http://www.telegraph.co.uk/finance/newsbysector/constructionandproperty/11328658/Crime-map-Is-your-home-in-one-of-Londons-burglary-hot-spots.html
3. CC BY-NC-SA 4.0. Steven M. Manson, 2015
4. CC BY-NC-SA 4.0. Laura Matson, 2015
5. Public Domain. Metropolitan Police Service (2015) http://news.met.police.uk
6. CC BY-NC-SA 4.0. Steven M. Manson, 2005
7. Public domain. Metropolitan Police Service (2015) http://news.met.police.uk
8. Public Domain. John Snow; Published by C.F. Cheffins, Lith, Southampton Buildings, London, England, 1854 in Snow, John. On the Mode of Communication of Cholera, 2nd Ed, John Churchill, New Burlington Street, London, England, 1855. https://commons.wikimedia.org/w/index.php?curid=2278605
9. CC BY-NC-SA 4.0. Laura Matson, 2015. Google maps.
10. CC BY-NC-SA 4.0. Sara Nelson 2015. Data from SocialExplorer and US Census
11. Fair use. New York Times (April 24, 2012). In Years Since the Riots, a Changed Complexion in South Central
12. Fair use. New York Times (August 2, 2012). A Divided City.
13. CC BY-NC-SA 4.0. Laura Matson, 2015. Google maps