Sample and Data Two hundred and thirty-nine cities were redlined. As
part of the Mapping Inequality project, the University of Richmond’s
Digital Scholarship Lab georectified and digitized more than 150 HOLC
maps where HOLC-defined neighborhoods are represented as polygons 1.
Shapefiles for areas with available land cover data, described below,
were downloaded.
The heterogeneity of urban environments necessitates high-resolution
and high-accuracy measures of tree canopy. 30m2 resolution datasets
such as Landsat scenes or derivative products such as the National
Land Cover Database (NLCD) are insufficient for mapping trees in a way
that effectively operationalizes lived experience in cities 2,3. For
consistency, high-resolution tree canopy data were obtained from
eleven sources.
Land cover data for twenty three areas were downloaded from The
Spatial Analysis Lab (The SAL, http://gis.w3.uvm.edu/utc/, Table S2)
at the University of Vermont. The SAL routinely maps large spatial
extents such as counties and their methods are detailed elsewhere 4–6.
Next, tree canopy data for the entire state of Pennsylvania were
obtained for all HOLC-mapped cities in Pennsylvania from SAL (Altoona,
Johnstown, New Castle, Philadelphia and Pittsburgh,
http://letters-sal.blogspot.com/2015/09/pennslyvania-statewide-high-resolution.html).
Tree canopy data for eight cities (Baltimore, MD; Johnson
City-Binghamton, Syracuse, and Utica, NY; Lynchburg, Norfolk,
Richmond, and Roanoke, VA) were obtained (Chesapeake Bay Program,
https://chesapeakeconservancy.org/conservation-innovation-center/high-resolution-data/).
Data for New Jersey (Atlantic City, Camden, and Trenton) were obtained
(Pennsylvania Spatial Data Access,
http://www.pasda.psu.edu/uci/DataSummary.aspx?dataset=3193). Finally,
a literature review was used to identify (n = 8) sources for
additional land cover data overlapping HOLC-graded areas and
corresponding authors were contacted for data access (Los Angeles and
Sacramento, CA; Denver, CO; Miami and Tampa, FL; Hollyoke-Chicopee,
MA; Toledo, OH; and Seattle, WA). In total, there were 3,188
HOLC-defined neighborhoods, from 37 of cities, in 16 of states from 11
sources (Table S2). Statistical analyses were conducted in R v. 3.6.1
7 using the tidyverse 8, simple features 9, ggpubr 10, lme4 11, sjPlot
12, and sjstats 13 packages.
Dependent variables
The dependent variable was the percentage of tree canopy cover within
each HOLC zone. Consistent with previously published literature 14,15,
we define and operationalize tree canopy as “the layer of leaves,
branches, and stems of trees that cover the ground when viewed from
above” 16. After projecting the HOLC polygons obtained from the
Mapping Inequality Project to match the land cover data, the Tabulate
Area tool was used in ArcMap Version 10.2.2 (ESRI, 2014) to calculate
the percent of tree canopy cover for each polygon. In seven cities
(Boston, Denver, Detroit, New Haven, New York City, Seattle, and
Toledo), tree canopy data were not available for the entire extent of
the HOLC-defined neighborhoods, which occasionally extended into
suburban areas surrounding the municipalities of interest and 156
polygons had to be omitted. This represents 4.67% of the dataset and
was unavoidable. As a robustness check, described below, our main
regression model was re-fit with those seven cities entirely removed.
Empirical strategy
We conducted two analyses of variance (ANOVA) with tree canopy as the
dependent variable. In the first ANOVA, the independent variable was
the HOLC categories in order to test our main hypothesis that mean
canopy cover varied by grade. A post-hoc Tukey HSD was then used to
examine which pairs of grades differed from each other. This initial
ANOVA was re-fit as a linear regression model so that Grade A would be
the base-case for comparison, and letters B, C, and D would be
estimated as differences in means from A. This is Model 1.
In the second ANOVA, the independent variable was the city in which
each neighborhood was located (hereafter Model 2). This analysis was
conducted because we were concerned that unobserved city-specific
characteristics pertaining to such things as land use policy, urban
form, climate, and other factors may have influenced tree canopy
cover. The purpose of Model 2 was to test whether tree canopy cover
varied across each study city.
As anticipated, tree canopy varies significantly by city. We therefore
fit a mixed effects model with the four-category HOLC grades as the
fixed effects, with random intercepts for city, as shown in Eq. 1 and
termed Model 3.
Eq. 1
Where is tree canopy as a percentage land area for HOLC polygon i in
city j. HOLC grade A is the reference, and is the intercept and mean
value of percent tree canopy cover in formerly A-graded neighborhoods.
, , , are the coefficients of interest, which represent the
differences in mean tree canopy from A by HOLC grades B, C, and D,
respectively. represents the city-specific random intercept, which was
included to capture unobserved aspects of each city, is the
observation-level residuals, σ2 is the within city variance, and τ00
represents the variance across cities. The variance partitioning
coefficient, also known as the intraclass correlation coefficient
(ICC) is “a population estimate of the variance explained by the
grouping structure” 17, which was calculated as the
between-group-variance (τ00, random intercept variance) divided by the
total variance (i.e. sum of between-group-variance τ00 and
within-group σ2 residual variance), shown in Eq. 2.
ICC = τ00 / [τ00 + σ2] Eq. 2
T-statistics were treated as Wald Z-statistics for calculating the
confidence intervals and p-values, assuming a normal-distribution. An
approximate R2 was computed as the proportion of variance explained in
the random effect after adding the categorical HOLC fixed effect to
the model. This is computed as the correlation between fitted and
observed values 18. AIC minimization was used to compare Models 1, 2,
and 3, and to determine the best fitting model 19.
Cities with enough A- and D-graded neighborhoods were examined in
order to determine if the patterns from cross-city, pooled analyses
hold within individual cities. D-graded areas are common, but A-graded
areas were limiting. For each city with 10 HOLC-defined
A-neighborhoods (n = 8: Los Angeles, Chicago, Cleveland, New York
City, Lynchburg, Seattle, Pittsburgh, Philadelphia), Wilcoxon Rank-Sum
tests were used to compare pairwise differences in tree canopy cover
from A to D neighborhoods. All other pairwise tests were omitted for
parsimony (Figure 2).
Methods for further tests and robustness checks
Four types of checks were conducted: one set to assess the potentially
undue influence of cities with many HOLC-defined neighborhoods, a
second to assess the influence of metropolitan areas with partially
missing data, and a third to examine the sensitivity of grouping the
five boroughs of New York City, and Chelsea and Cambridge with Boston,
and a fourth to examine data from different sources.
Two strategies were used in order to evaluate whether the results of
Models 1, 2, and 3 were driven by the metropolitan areas with the most
HOLC-defined neighborhoods. First, the boxplots for all cities are
provided in Figure S1 so that the within city patterns can be examined
visually. Secondly, as a robustness check, Model 3 was re-fit without
data from the metropolitan areas with ≥ 50 neighborhoods to see if the
patterns would still hold (Table S1). The inferences from this smaller
model remain unchanged, however the confidence intervals are larger by
construction.
Tree canopy data were not available for the entire extent of the
HOLC-defined areas in seven metropolitan areas. The missing data are
usually at the edges of the geographic extent, and therefore
non-random. Specifically, tree canopy data were not available for the
entire extent HOLC-defined neighborhoods in Boston, Denver, Detroit,
New Haven, New York City, Seattle, and Toledo, which collectively
represent 4.67% of the total dataset’s observations. To address
non-random, partially missing data at the edges of these metropolitan
regions, Model 3 was re-fit with these cities removed entirely (Table
S1, Model 5). Model 5 provides substantively similar results and
interpretation to the main Model 3 and the point estimates remain
within the bounds of Model 3’s confidence intervals.
The sensitivity of the analytical decision to group the five boroughs
of New York City, and Chelsea and Cambridge with Boston was also
examined. A version of Model 3 (Table S1, Model 5) was fit without
grouping, which adds 6 additional random intercepts. Again, no
substantive changes were observed.
Finally, land cover data for Sacramento, Denver, Miami, Tampa,
Holyoke-Chicopee, Toledo, and Seattle all came from different sources
(Table S1, Model 6). It is possible that data from those cities may
have influenced the results if the land cover data were not comparable
to those produced by SAL. Based on Model 6, no substantive changes
were observed. All robustness check models supported the inferences of
the main results: formerly D-graded areas had roughly half as much
tree canopy as formerly A-graded areas.
1. Nelson, K. R., Winling, L., Marciano, R., Connolly, N. & et al.
Mapping Inequality. in American Panorama (eds. Nelson, R. K. &
Ayers, E. L.) (2019).
2. Smith, M. L., Zhou, W., Cadenasso, M. L., Grove, J. M. & Band,
L. E. Evaluation of the National Land Cover Database for Hydrologic
Applications in Urban and Suburban Baltimore, Maryland. JAWRA J. Am.
Water Resour. Assoc. 46, 429–442 (2010).
3. Grove, J. M., Locke, D. H., O’Neil-Dunne, J. P. M. &
O’Neil-Dunne, J. P. M. An Ecology of Prestige in New York City:
Examining the Relationships Among Population Density, Socio-economic
Status, Group Identity, and Residential Canopy Cover. Environ. Manage.
54, 402–419 (2014).
4. O’Neil-Dunne, J. P. M., MacFaden, S. W. & Royar, A. A
Versatile, Production-Oriented Approach to High-Resolution Tree-Canopy
Mapping in Urban and Suburban Landscapes Using GEOBIA and Data Fusion.
Remote Sens. 6, 12837–12865 (2014).
5. MacFaden, S. W., O’Neil-Dunne, J. P. M., Royar, A. R., Lu, J. W. T.
& Rundle, A. G. High-resolution tree canopy mapping for New York
City using LIDAR and object-based image analysis. J. Appl. Remote
Sens. 6, (2012).
6. Taylor, P. et al. An object-based system for LiDAR data fusion and
feature extraction. Geocarto Int. 28, 1–16 (2012).
7. Core Team, R. R: A language and environment or statistical
computing. (2019).
8. Wickham, H. tidyverse: Easily Install and Load the ‘Tidyverse’.
(2017).
9. Pebesma, E. Simple features for R: Standardized support for spatial
vector data. R J. 10, 439–446 (2018).
10. Kassambara, A. ggpubr: ‘ggplot2’ Based Publication Ready Plots.
(2018).
11. Bates, D. M., Maechler, M., Bolker, B. & Walker, S. Fitting
linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48
(2015).
12. Lüdecke, D. sjPlot: Data Visualization for Statistics in Social
Science. (2018). doi:doi: 10.5281/zenodo.1308157
13. Lüdecke, D. sjstats: Statistical Functions for Regression Models.
(2019). doi:doi: 10.5281/zenodo.1284472
14. Locke, D. H., Landry, S. M., Grove, J. M., Roy Chowdhury, R. &
Chowdhury, R. R. What’s scale got to do with it? Models for urban tree
canopy. J. Urban Ecol. 2, juw006 (2016).
15. Schwarz, K. et al. Trees Grow on Money: Urban Tree Canopy Cover
and Environmental Justice. PLoS One 10, e0122051 (2015).
16. O’Neil-Dunne, J. P. M. A Report on the City of Baltimore’s
Existing and Possible Urban Tree Canopy. (2009).
17. Hox, J. J. Applied Multilevel Analysis. Applied Multilevel
Analysis (1995). doi:10.1017/cbo9780511610806
18. Nakagawa, S. & Schielzeth, H. Coefficient of determination R 2
and intra-class correlation coefficient ICC from generalized linear
mixed-effects models. Ecol. Evol. 14, 20170213 (2017).
19. Burnham, K. P. & Anderson, D. R. Model Selection and
Multimodel Inference: A practical Information-theoretic Approach (2nd
ed). Library of Congress Cataloging-in-Publication Data. Ecological
Modelling 172, (2002).