Sample and Data
Two hundred and thirty-nine cities were redlined. As part of the
Mapping Inequality project, the University of Richmond’s Digital
Scholarship Lab georectified and digitized more than 150 HOLC maps
where HOLC-defined neighborhoods are represented as polygons
1. Shapefiles for areas with available land
cover data, described below, were downloaded.
The heterogeneity of urban environments necessitates high-resolution
and high-accuracy measures of tree canopy.
30m2 resolution datasets such as Landsat
scenes or derivative products such as the National Land Cover Database
(NLCD) are insufficient for mapping trees in a way that effectively
operationalizes lived experience in cities
2
,
3.
For consistency, high-resolution tree canopy data were obtained from
eleven sources.
Land cover data for twenty three areas were downloaded from The
Spatial Analysis Lab, University of Vermont (The SAL,
https://www.uvm.edu/rsenr/sal/)
at the University of Vermont. The SAL routinely maps large spatial
extents such as counties and their methods are detailed elsewhere
4
–
6.
Next, tree canopy data for the entire state of Pennsylvania were
obtained for all HOLC-mapped cities in Pennsylvania from SAL (Altoona,
Johnstown, New Castle, Philadelphia and Pittsburgh,
http://letters-sal.blogspot.com/2015/09/pennslyvania-statewide-high-resolution.html).
Tree canopy data for eight cities (Baltimore, MD; Johnson
City-Binghamton, Syracuse, and Utica, NY; Lynchburg, Norfolk,
Richmond, and Roanoke, VA) were obtained (Chesapeake Bay Program,
https://chesapeakeconservancy.org/conservation-innovation-center/high-resolution-data/).
Data for New Jersey (Atlantic City, Camden, and Trenton) were obtained
(Pennsylvania Spatial Data Access,
http://www.pasda.psu.edu/uci/DataSummary.aspx?dataset=3193).
Finally, a literature review was used to identify (n = 8) sources for
additional land cover data overlapping HOLC-graded areas and
corresponding authors were contacted for data access (Los Angeles and
Sacramento, CA; Denver, CO; Miami and Tampa, FL; Hollyoke-Chicopee,
MA; Toledo, OH; and Seattle, WA). In total, there were 3,188
HOLC-defined neighborhoods, from 37 of cities, in 16 of states from 11
sources (Table S2). Statistical analyses were conducted in R v. 3.6.1
7 using the tidyverse
8, simple features
9, ggpubr
1
0, lme4
1
1, sjPlot
1
2, and sjstats
1
3 packages.
Dependent variables
The dependent variable was the percentage of tree canopy cover within
each HOLC zone. Consistent with previously published literature
1
4
,15,
we define and operationalize tree canopy as “the layer of leaves,
branches, and stems of trees that cover the ground when viewed from
above” 1
6. After
projecting the HOLC polygons obtained from the Mapping Inequality
Project to match the land cover data, the Tabulate Area tool was used
in ArcMap Version 10.2.2 (ESRI, 2014) to calculate the percent of tree
canopy cover for each polygon. In seven cities (Boston, Denver,
Detroit, New Haven, New York City, Seattle, and Toledo), tree canopy
data were not available for the entire extent of the HOLC-defined
neighborhoods, which occasionally extended into suburban areas
surrounding the municipalities of interest and 156 polygons had to be
omitted. This represents 4.67% of the dataset and was unavoidable. As
a robustness check, described below, our main regression model was
re-fit with those seven cities entirely removed.
Empirical strategy
We conducted two analyses of variance (ANOVA) with tree canopy as the
dependent variable. In the first ANOVA, the independent variable was
the HOLC categories in order to test our main hypothesis that mean
canopy cover varied by grade. A post-hoc Tukey HSD was then used to
examine which pairs of grades differed from each other. This initial
ANOVA was re-fit as a linear regression model so that Grade A would be
the base-case for comparison, and letters B, C, and D would be
estimated as differences in means from A. This is Model 1.
In the second ANOVA, the independent variable was the city in which
each neighborhood was located (hereafter Model 2). This analysis was
conducted because we were concerned that unobserved city-specific
characteristics pertaining to such things as land use policy, urban
form, climate, and other factors may have influenced tree canopy
cover. The purpose of Model 2 was to test whether tree canopy cover
varied across each study city.
As anticipated, tree canopy varies significantly by city. We therefore
fit a mixed effects model with the four-category HOLC grades as the
fixed effects, with random intercepts for city, as shown in Eq. 1 and
termed Model 3.
η_ij=γ_00+ γ_10 HOLC_(grade_B)+ γ_20 HOLC_(grade_C)+ γ_30
HOLC_(grade_D)+ μ_0j+ e_ij Eq. 1
Where n_ij is tree canopy as a percentage land area for HOLC polygon
i in city j. HOLC grade A is
the reference, and y_00 is the intercept and mean value of percent
tree canopy cover in formerly A-graded neighborhoods. y_10, y_20,y_30,
are the coefficients of interest, which represent the differences in
mean tree canopy from A by HOLC grades B, C, and D, respectively. μ_0j
represents the city-specific random intercept, which was included to
capture unobserved aspects of each city, e_ij is the observation-level
residuals, σ2 is the within city variance,
and τ00 represents the variance across cities.
The variance partitioning coefficient, also known as the intraclass
correlation coefficient (ICC) is “a population estimate of the
variance explained by the grouping structure”
1
7, which was
calculated as the between-group-variance (τ00,
random intercept variance) divided by the total variance (i.e. sum of
between-group-variance τ00 and within-group
σ2 residual variance), shown in Eq. 2.
ICC = τ00 / [τ00 + σ2] Eq. 2
T-statistics were treated as Wald Z-statistics for calculating the
confidence intervals and p-values, assuming a normal-distribution. An
approximate R2 was computed as the
proportion of variance explained in the random effect after adding the
categorical HOLC fixed effect to the model. This is computed as the
correlation between fitted and observed values
1
8. AIC
minimization was used to compare Models 1, 2, and 3, and to determine
the best fitting model
1
9.
Cities with enough A- and D-graded neighborhoods were examined in
order to determine if the patterns from cross-city, pooled analyses
hold within individual cities. D-graded areas are common, but A-graded
areas were limiting. For each city with ≥10 HOLC-defined
A-neighborhoods (n = 8: Los Angeles, Chicago, Cleveland, New York
City, Lynchburg, Seattle, Pittsburgh, Philadelphia), Wilcoxon Rank-Sum
tests were used to compare pairwise differences in tree canopy cover
from A to D neighborhoods. All other pairwise tests were omitted for
parsimony (Figure 2).
Methods for further tests and robustness checks
Four types of checks were conducted: one set to assess the potentially
undue influence of cities with many HOLC-defined neighborhoods, a
second to assess the influence of metropolitan areas with partially
missing data, and a third to examine the sensitivity of grouping the
five boroughs of New York City, and Chelsea and Cambridge with Boston,
and a fourth to examine data from different sources.
Two strategies were used in order to evaluate whether the results of
Models 1, 2, and 3 were driven by the metropolitan areas with the most
HOLC-defined neighborhoods. First, the boxplots for all cities are
provided in Figure S1 so that the within city patterns can be examined
visually. Secondly, as a robustness check, Model 3 was re-fit without
data from the metropolitan areas with ≥ 50 neighborhoods to see if the
patterns would still hold (Table S1). The inferences from this smaller
model remain unchanged, however the confidence intervals are larger by
construction.
Tree canopy data were not available for the entire extent of the
HOLC-defined areas in seven metropolitan areas. The missing data are
usually at the edges of the geographic extent, and therefore
non-random. Specifically, tree canopy data were not available for the
entire extent HOLC-defined neighborhoods in Boston, Denver, Detroit,
New Haven, New York City, Seattle, and Toledo, which collectively
represent 4.67% of the total dataset’s observations. To address
non-random, partially missing data at the edges of these metropolitan
regions, Model 3 was re-fit with these cities removed entirely (Table
S1, Model 5). Model 5 provides substantively similar results and
interpretation to the main Model 3 and the point estimates remain
within the bounds of Model 3’s confidence intervals.
The sensitivity of the analytical decision to group the five boroughs
of New York City, and Chelsea and Cambridge with Boston was also
examined. A version of Model 3 (Table S1, Model 5) was fit without
grouping, which adds 6 additional random intercepts. Again, no
substantive changes were observed.
Finally, land cover data for Sacramento, Denver, Miami, Tampa,
Holyoke-Chicopee, Toledo, and Seattle all came from different sources
(Table S1, Model 6). It is possible that data from those cities may
have influenced the results if the land cover data were not comparable
to those produced by SAL. Based on Model 6, no substantive changes
were observed. All robustness check models supported the inferences of
the main results: formerly D-graded areas had roughly half as much
tree canopy as formerly A-graded areas.
1. Nelson, K. R., Winling, L., Marciano, R., Connolly, N. & et al.
Mapping Inequality. in American Panorama (eds. Nelson, R. K. &
Ayers, E. L.) (2019).
2. Smith, M. L., Zhou, W., Cadenasso, M. L., Grove, J. M. & Band,
L. E. Evaluation of the National Land Cover Database for Hydrologic
Applications in Urban and Suburban Baltimore, Maryland. JAWRA J. Am.
Water Resour. Assoc. 46, 429–442 (2010).
3. Grove, J. M., Locke, D. H., O’Neil-Dunne, J. P. M. &
O’Neil-Dunne, J. P. M. An Ecology of Prestige in New York City:
Examining the Relationships Among Population Density, Socio-economic
Status, Group Identity, and Residential Canopy Cover. Environ. Manage.
54, 402–419 (2014).
4. O’Neil-Dunne, J. P. M., MacFaden, S. W. & Royar, A. A
Versatile, Production-Oriented Approach to High-Resolution Tree-Canopy
Mapping in Urban and Suburban Landscapes Using GEOBIA and Data Fusion.
Remote Sens. 6, 12837–12865 (2014).
5. MacFaden, S. W., O’Neil-Dunne, J. P. M., Royar, A. R., Lu, J. W. T.
& Rundle, A. G. High-resolution tree canopy mapping for New York
City using LIDAR and object-based image analysis. J. Appl. Remote
Sens. 6, (2012).
6. Taylor, P. et al. An object-based system for LiDAR data fusion and
feature extraction. Geocarto Int. 28, 1–16 (2012).
7. Core Team, R. R: A language and environment or statistical
computing. (2019).
8. Wickham, H. tidyverse: Easily Install and Load the ‘Tidyverse’.
(2017).
9. Pebesma, E. Simple features for R: Standardized support for spatial
vector data. R J. 10, 439–446 (2018).
10. Kassambara, A. ggpubr: ‘ggplot2’ Based Publication Ready Plots.
(2018).
11. Bates, D. M., Maechler, M., Bolker, B. & Walker, S. Fitting
linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48
(2015).
12. Lüdecke, D. sjPlot: Data Visualization for Statistics in Social
Science. (2018). doi:doi: 10.5281/zenodo.1308157
13. Lüdecke, D. sjstats: Statistical Functions for Regression Models.
(2019). doi:doi: 10.5281/zenodo.1284472
14. Locke, D. H., Landry, S. M., Grove, J. M., Roy Chowdhury, R. &
Chowdhury, R. R. What’s scale got to do with it? Models for urban tree
canopy. J. Urban Ecol. 2, juw006 (2016).
15. Schwarz, K. et al. Trees Grow on Money: Urban Tree Canopy Cover
and Environmental Justice. PLoS One 10, e0122051 (2015).
16. O’Neil-Dunne, J. P. M. A Report on the City of Baltimore’s
Existing and Possible Urban Tree Canopy. (2009).
17. Hox, J. J. Applied Multilevel Analysis. Applied Multilevel
Analysis (1995). doi:10.1017/cbo9780511610806
18. Nakagawa, S. & Schielzeth, H. Coefficient of determination R 2
and intra-class correlation coefficient ICC from generalized linear
mixed-effects models. Ecol. Evol. 14, 20170213 (2017).
19. Burnham, K. P. & Anderson, D. R. Model Selection and
Multimodel Inference: A practical Information-theoretic Approach (2nd
ed). Library of Congress Cataloging-in-Publication Data. Ecological
Modelling 172, (2002).