The LTS-US dataset is constructed in a three-part pipeline: (1) aggregate training
data, (2) create classification models, and (3) apply predictions to lakes outside of
the training data. Individual steps within the pipeline are described below.
Step 1: Identify Parent Datasets
United States Environmental Protection Agency National Lakes Assessment
In situ measurements of total phosphorus and true color were compiled from the U.S.
National Lakes Assessment (NLA), a synoptic sampling campaign of lakes, ponds, and
reservoirs, hereafter collectively referred to as “lakes”, conducted in the contiguous
U.S. every five years. Lakes used in this analysis were sampled in the summer (June
-September) of 2007 (n = 1028), 2012 (n = 1038), or 2017 (n = 1005). Lakes were selected
from the National Hydrography Dataset (NHD,
https://www.usgs.gov/national-hydrography/national-hydrography-dataset) using a
randomized design stratified on aggregated Omernik level III ecoregion and lake surface
area. The minimum surface area for inclusion in the 2007 assessment was 4 ha, but, owing
to increasing resolution in the NHD, was reduced to 1 ha for the 2012 and 2017
assessments. Natural lakes and reservoirs were treated equally for site selection
purposes. A wide set of measurements were collected at each sampled lake, but we only
provide details on the variables used in this analysis. Additional details, protocols,
and data are available online
(https://www.epa.gov/national-aquatic-resource-surveys/nla).
Total phosphorus and true color were collected and processed in the 2007, 2012, and
2017 field campaigns (USEPA 2007, 2011, 2017a). Water samples were collected from a
deep, open water (up to 50 m deep) location in natural lakes and at a midpoint in
reservoirs. Water was collected from the photic zone using a vertical, depth integrated
sampling device. True color was estimated by visual comparison of filtered water samples
to a calibrated glass color disk (USEPA 1987). Total phosphorus concentrations were
measured with manual alkaline persulfate digestion, followed by automated colorimetric
analysis (ammonium molybdate and antimony potassium tartrate under acidic conditions,
with absorbance at 880 nm) using a flow injection analyzer following standard method
4500-P-E (APHA 1999). Detailed descriptions of all water quality analyses are available
in the NLA Laboratory Operations Manuals (USEPA 2007, 2012, 2017b).
HydroLAKES
HydroLAKES (Messager et al. 2016) is a compendium of more than 1.4 million lake and
reservoir shapefiles globally, with surface area of at least 10 ha. For an individual
waterbody, HydroLAKES contains its spatial extent and location (using georeferenced
polygons), a unique identifier (ranging from 1 to 1,427,688), and its morphological
(area, mean depth, elevation, shoreline length etc.), hydrological (e.g., residence
time, discharge, watershed area, watershed area), and geographical (e.g., name, country,
continent) properties. HydroLAKES is a compilation of existing lake databases, with
sources from government agencies (e.g., Natural Resources Canada, U.S. Geological
Survey, European Environment Agency) and from remote sensing studies (for example,
Shuttle Radar Topographic Mission Water Body Data, Global Lakes and Wetlands Database,
and Global Reservoir and Dam Database). Most of the lake polygons are sourced from the
Shuttle Radar Topographic Mission Water Body Data for regions between 60ºS and 60ºN
(Robinson et al. 2014), supplemented by other datasets for higher latitudes and for
underrepresented regions. More detailed information on the creation and validation of
the HydroLAKES dataset can be found in Messager et al. (2016).
LimnoSat
The LimnoSat-US (Topp et al. 2020) dataset comprises over 22 million remotely sensed
observations of lake surface reflectance from 1984 to 2020. Observations cover over
50,000 lakes greater than 10 ha (Messager et al. 2016) aggregated from Landsat 5, 7, and
8 Collection 1 imagery. Each observation was calculated by taking the median surface
reflectance within 120 meters of each lake’s Chebyshev center, defined as the point
farthest from shore and usually is located at the lake’s deepest point (Shen et al.
2015). Extracting reflectance values from the Chebyshev center minimizes signals due to
bottom reflectance and adjacent land pixels. For each observation, non-high confidence
water pixels were masked using the Dynamic Surface Water Extent algorithm (Jones 2019).
Observations were removed if the scene cloud cover was greater than 75%, any snow, ice,
cloud, cloud shadow (Foga et al. 2017), or hillshade was detected over the lake’s
Chebyshev center, or if there were fewer than eight high confidence water pixels within
the 120 meter buffer of the lake’s Chebyshev center. For certain lakes, these filters
lead to extended periods with limited observations. Data in LimnoSat-US are presented in
a tabular format, where each row reflects a Landsat overpass for a given waterbody, and
columns include median Collection 1 surface reflectance values by band extracted from
pixels within 120 m of the Chebyshev center, scene wide cloud cover, date of imagery
acquisition, and number of water pixels within 120 m of the Chebyshev center.
Step 2: Define Lake Trophic State
Many lakes across the United States are experiencing simultaneous changes in their
water clarity, with some lakes getting greener due to eutrophication, and others getting
browner from increasing terrestrially-derived organic matter, and some are
simultaneously ‘greening’ and ‘browning’ (Leech et al. 2018). Given the need to
discriminate between lakes that may be browning and/or greening, the Nutrient Color
Paradigm (NCP) is a useful tool to assign LTS based on a lake’s characteristic
color.
The NCP was initially proposed in the early 20th century, emphasizing that both
autochthonous and allochthonous processes are important to understanding LTS (Naumann
1917; Thienemann 1921; Jӓrnefelt 1925). Specifically, water color often affects algal
biomass and light transparency independent of nutrient availability. Rodhe (1969) first
assembled the four quadrants of the NCP, placing autochthony on the horizontal axis and
allochthony on the vertical axis. This second dimension distinguishes “oligotrophic”
(low nutrient, low color) lakes, “eutrophic” (high nutrient, low color) lakes,
“dystrophic” (low nutrient, high color) lakes, and “mixotrophic” (high nutrient, high
color) lakes.
While metrics such as Trophic State Index (Carlson 1977) gained popularity for
providing instantaneous assessments of a lake’s autotrophic production, Williamson et
al. (1999) encouraged a focus on NCP for lake classification given the importance of
both nutrients and dissolved organic matter to lake structure and function. The NCP’s
implementation is empirically supported by studies like Webster et al. (2008), where an
analysis of ~1,600 temperate lakes in North America demonstrated that within lakes
grouped by total phosphorus concentration (i.e., oligotrophic, mesotrophic, or
eutrophic), those with ‘browner’ color (indicative of dissolved organic matter) had
higher volumetric chlorophyll-a concentrations and shallower Secchi disk depths. A
similar pattern was observed by Nürnberg and Shaw (1998), which analyzed 600 lakes
spanning a latitude of 39°S to 82°N.
Here, we used the thresholds published in Webster et al. (2008) to classify lakes in
the NLA dataset. Lakes were described as oligotrophic or ‘blue’ if total phosphorus
concentration was less than 30 μg/L and true color was less than 20 platinum cobalt
units (PCU), eutrophic/mixotrophic or ‘green/murky’ if total phosphorus was greater than
30 μg/L, dystrophic or ‘brown’ if total phosphorus was less than 30 μg/L and true color
greater than 20 PCU. Thresholds for total phosphorus are based on long established and
widely accepted ranges affecting primary productivity (Wetzel 2001). True color
thresholds are derived from Nürnberg and Shaw (1998).
Step 3: Create a training dataset
First, to create a dataset of lakes with in situ LTS measurements, we aggregated all
total phosphorus and true color measurements from the US EPA NLA 2007, 2012, and 2017
data. While the NLA includes lakes smaller than 10 ha, we only used lakes of at least 10
ha in area, for consistency with the HydroLAKES database. We then assessed the extent to
which seasonally shifting total phosphorus concentrations may alter interpretation of
trophic state for a given lake using the subset of lakes that were sampled
intra-annually. We calculated the percentage of lakes that transitioned between trophic
states within a single year and found that lakes broadly remained in the same NCP
trophic state throughout a given summer (85.1% of lakes). Of the lakes that changed
trophic state during a sampling season (14.9%), the majority transitioned from
oligotrophic (61.5% of changing lakes; 8.7% of all lakes) or dystrophic (15.4% of
changing lakes; 2.2% of all lakes) to eutrophic/mixotrophic. Few lakes transitioned from
oligotrophic to dystrophic (15.4% of changing lakes; 2.2% of all lakes), and even fewer
transitioned to oligotrophic from either dystrophic (3.9% of changing lakes; 0.5% of all
lakes) or eutrophic/mixotrophic (3.9% of changing lakes; 0.5% of all lakes). No lakes
transitioned from eutrophic/mixotrophic to dystrophic across all three NLA campaigns.
Broadly, lakes transitioned between trophic states when lakes were located near a
threshold for trophic state delineation (15-45 μg/L total phosphorus or 11-29 PCU).
These results mirror those in Leech et al. (2018) and suggest that despite some lakes
changing trophic states, the majority of lakes do not transition and those that do
transition usually fall along an edge of a NCP-determined trophic state. Thus, for lakes
sampled twice in one sampling campaign, we averaged total phosphorus and true color
estimates.
Second, to match the in situ trophic states with remotely sensed imagery, we merged
the complete 2007, 2012, and 2017 NLA dataset with the LimnoSat-US dataset (Topp et al.
2020), where each NLA lake-year had corresponding Landsat spectral data. Because the NLA
is designed to describe lakes’ summertime conditions, we filtered LimnoSat-US
observances for those only occurring in June, July, and August, which we a priori
defined as the summertime season for the contiguous U.S.; then, to create a
characteristic reflectance for a given lake-year, we computed each lake-year’s median
summertime reflectance for red, blue, green, and near-infrared bands. Because
LimnoSat-US compiles reflectance values from Landsat 5, 7, and 8, there are differences
in the number of images per lake and year. In particular, images from 1984 through 1998
were solely collected from Landsat 5, when lakes averaged 3.04 images per summer (min:
2.43 images; max: 3.64 images). From 1999 through 2012, summertime imagery was gathered
from Landsat 5 and 7, when lakes averaged 5.64 images per summer (min: 3.37 images; max:
6.42 images). From 2013 through 2019, summertime imagery was collected from Landsat 7
and 8, when lakes averaged 5.42 images per summer (min: 4.87 images; max: 5.87 images).
Third, to better characterize spectral bands’ relative reflectance, we normalized
each lake’s summertime median red, green, blue, and near-infrared band by the sum of all
four bands. This normalization allowed us to differentiate lakes by trophic state based
on their most prominent reflectances. For example, we anticipated that oligotrophic
lakes would be dominated by blue and green reflectances relative to the red and
near-infrared bands. In contrast, dystrophic lakes would be dominated by the
near-infrared and blue bands relative to green and red bands. These relative
reflectances were ultimately intended to discriminate among lakes that were optically
similar in the visible spectrum (i.e., oligotrophic and dystrophic lakes). Notably, the
decision to use median summertime relative reflectances differed from previous work
(e.g., Topp et al. 2020) that focused on the dominant wavelength, which was an
aggregation of wavelengths detected in the visible spectrum and has been used to
discriminate autotrophic production (i.e., blue vs green lakes), but not dystrophic
states. Thus, our methods are better suited towards discriminating between oligotrophic
and dystrophic lakes, whereas the dominant wavelength approach would consider both of
these lake types to be “blue”.
Step 4: Create Classification Models
To find an optimal performing classifier for lakes with unknown LTS, we employed
three classification methods to predict trophic state: multinomial logistic regression
(Venables and Ripley 2002), extreme gradient boosting regression (Friedman 2001), and a
neural network using multilayer perceptrons (Rosenblatt 1958). Logistic regression is a
parametric classification method, whereas gradient boosted regression and multilayer
perceptrons are machine learning methods. The methods differ in how they make
classifications. Using trophic state as a categorical response variable, logistic
regression applies a linear regression of log-odds ratios to model the probability of a
given trophic state for each lake. In contrast, gradient boosted regression applies
decision trees to iteratively improve its predictions. Multilayer perceptrons apply a
type of feedforward artificial neural network in which a backpropagation algorithm is
used to subsequently update the individual weights of each neuron unit by comparing
modeled predictions to the training data.
For each modeling method, we used z-scored, relative red, green, blue, and
near-infrared reflectances for input data. Model performance and potential for
overfitting were assessed using a 90:10 train:test data split with spatial-holdout
cross-validation.
Initial hyperparameters for the gradient boosted regression and multilayer
perceptron models were tuned by holding out 20% of each trophic class from the training
observations to use for validation and conducting a coarse grid-search across the
hyperparameter space. For each combination of hyperparameters, models were trained until
validation performance did not increase for 20 consecutive epochs using categorical
cross entropy as the objective function. During the multilayer perceptron hyperparameter
tuning, we iterated through model fits using all combinations of 5, 10, and 20 hidden
layers as well as a learning rate of 0.01, 0.001, and 0.0005. Multilayer perceptron
hyperparameter tuning metrics were optimal for models with 20 hidden units and a
learning rate of 0.001. During the gradient boosted regression hyperparameter tuning, we
iterated through model fits using all combinations of 2, 3, and 4 maximum tree depths,
subsample as well as column samples of 0.5 and 0.8, step sizes of 0.01 and 0.1, as well
as a minimum child weight of 1 and 3. Gradient boosted regression hyperparameter tuning
metrics were optimal for models with a max depth of 4, subsample of 0.5, column sample
of 0.5, step size of 0.01, and minimum child weight of 1. For both multilayer perceptron
and gradient boosted regression models, best performing hyperparameter tuning metrics
were assessed by having lowest validation loss values.
These hyperparameters were then used in a spatial cross-validation routine (sensu
Willard et al. 2021), where a given lake was held out of test data if it was included in
the training data. During the spatial cross-validation routine, training data were
divided into five folds, such that lakes within each test partition were not present in
remaining training partitions (i.e., test metrics represent performance on unseen
lakes). Training data within each fold were then partitioned into a 90:10 split with 10%
of each trophic class set aside for an inner-loop fold validation. Models within each
fold were trained using an early stopping criteria of 20 epochs to avoid overfitting on
the training data. This inner-fold validation was additionally used to hypertune the
best number of epochs for the final models. Finally, overall error metrics were
calculated based on the mean prediction accuracy of the test partitions withheld from
the inner-loop training of each fold. All reported metrics are based on the test
partitions from the spatial cross-validation routine while final models were trained on
the full dataset using the hyperparameters identified from the grid-search and
inner-loop validation routines.
Model diagnostics and performance were calculated using test data, but the final
models used to create the final dataset were constructed using all of the data in
LimnoSat-US. Once final models were validated for performance, we applied the final
models to make predictions for all 56,000 lakes in the LimnoSat-US dataset.
Step 5: Assess and Compare Model Performance
To evaluate the final fitted models, we used test data predictions from the
spatial-holdout routine to calculate each model’s overall and balanced accuracy,
receiver-operator-characteristic (ROC) curves, as well as the area under the curve (AUC)
of the ROC curve. Overall accuracy was calculated as the sum of true positives and true
negatives divided by the total number of LTS predictions. Balanced accuracy was
calculated as the sum of a true positive and true negative result for a single lake
trophic state. Whereas overall accuracy can be biased towards more prevalent trophic
states (i.e., eutrophic and oligotrophic lakes), balanced accuracy is useful to assess a
model’s capacity to predict more rare trophic states (i.e., dystrophic lakes). As an
additional metric of model performance, we calculated the AUC of each model’s ROC curve.
The ROC curve visually graphs the relationship between the rate of a correct
classification with the rate of a false classification. An AUC of 0.5 indicates a false
prediction rate increases 1:1 with the rate of a correct prediction. AUCs greater than
0.5 imply a model performing better than random, even when a false positive rate is
artificially inflated. Thus, comparing overall and balanced accuracy as well as ROC
curves and AUCs allowed us to assess how models performed broadly as well as how
robustly models predicted trophic state correctly.
Beyond model performance, we also evaluated whether model coefficients and variable
importance for trophic state discrimination reflected NCP groupings. For increased
interpretability across all three models, we employed SHAP (SHapley Additive
exPlanation) analysis (Shapley 1953; Štrumbelj and Kononenko 2014; Lundberg and Lee
2017) to better understand individual feature importance and influence in model
predictions. SHAP analysis yields insight into the marginal contribution of a given
feature (e.g., near-infrared spectra) on model output - in this case trophic state - and
helps decode ‘black box’ results. Understanding the relative contribution of individual
features in trophic state prediction not only helps explain feature roles in model
accuracy and misclassification but also quantitatively connects features, such as
remotely sensed data, to the biophysical parameters in which LTS prediction is grounded.
SHAP feature contribution was calculated for blue, green, red, and near-infrared Landsat
spectra. SHAP feature contribution was scored for oligotrophic, dystrophic, and
eutrophic classifications and across each of the three models. This scoring illuminates
the relationship among feature values and SHAP contribution for a given trophic state
classification and for a given model. Specifically, for classification problems, a
positive SHAP value indicates that a given input contributed to a positive
classification while a negative value indicates the input contributed to a low
probability for a given classification.
Code Availability
All data harmonization, modeling, and validation procedures for the LTS-US dataset
were scripted in the R Statistical Environment (R Core Team 2022), using the tidyverse
(Wickham et al. 2019), lubridate (Grolemund and Wickham 2011), data.table (Dowle and
Srinivasan 2021), sf (Pebesma 2018), keras (Allaire and Chollet 2022), tensorflow
(Allaire and Tang 2022), caret (Kuhn 2022), CAST (Meyer et al. 2022), yaml (Garbett et
al. 2022), reticulate (Ushey et al. 2022), xgboost (Chen et al. 2022), nnet (Venables
and Ripley 2002), viridis (Garnier et al. 2021), trend (Pohlert 2020), multiROC (Wei and
Wang 2018), ggpubr (Kassambara 2020), fastshap (Greenwell 2021), maps (Becker et al.
2021), ggtext (Wilke and Wiernik 2022), and ggforce (Pedersen 2022) packages.
To enhance reproducibility, all scripts are designed to work within a single
pipeline that uses the targets package (Landau 2021). The targets pipeline is divided
into four main components: “1_aggregate”, “2_train”, “3_predict”, and “4_qc”. Each
component corresponds to one of the steps presented above and can be customized by
future users to fit their specific needs. The associated pipeline setup and user guide
can be found on the dataset’s companion Git repository, where the main ReadMe file
details directory architecture and how to execute the pipeline.
To ensure reproducibility across operating platforms, all scripts for the pipeline
can be executed within a container. Running the pipeline within the container allows
users to execute the entire pipeline without the need to make small, yet important,
edits to the code, or to configure their own operating environment to conform to the
pipeline’s requirements. For example, recent versions of the sf package default to using
the s2 spherical geometry engine instead of GEOS, which assumes planar coordinates. End
users on a system with one version of the sf library might need to adjust the code to
use the correct geometry engine, whereas users with another version might be able to run
the pipeline without any adjustments. The container crystallizes a known-working set of
libraries, both at the system level (e.g. GEOS, GDAL, PROJ) and at the R level (e.g.
sf), so that anybody can run the code without reconfiguring their own environment. This
also provides future-proofing by ensuring that the inevitable changes to other libraries
over time do not lead to errors. To help end users, who are less familiar with running
containerized code, a tutorial for installing and executing the pipeline within the
container is located in the Environmental Data Initiative repository as a compressed
entity (see README.pdf).
References
Allaire, J. J., and F. Chollet. 2022. keras: R Interface to “Keras,.”
Allaire, J. J., and Y. Tang. 2022. tensorflow: R Interface to “TensorFlow,.”
APHA. 1999. Standard Methods for the Examination of Water and Wastewater. American
Public Health Association, Washington DC., American Public Health Association.
Becker, O. S. code by R. A., A. R. W. R. version by R. B. E. by T. P. Minka, and A.
Deckmyn. 2021. maps: Draw Geographical Maps,.
Carlson, R. E. 1977. A trophic state index for lakes. Limnol. Oceanogr. 22: 361–369.
doi:10.4319/lo.1977.22.2.0361
Chen, T., T. He, M. Benesty, and others. 2022. xgboost: Extreme Gradient
Boosting,.
Dowle, M., and A. Srinivasan. 2021. data.table: Extension of `data.frame`,.
Foga, S., P. L. Scaramuzza, S. Guo, and others. 2017. Cloud detection algorithm
comparison and validation for operational Landsat data products. Remote Sens. Environ.
194: 379–390. doi:10.1016/j.rse.2017.03.026
Friedman, J. H. 2001. Greedy function approximation: A gradient boosting machine.
Ann. Stat. 29: 1189–1232. doi:10.1214/aos/1013203451
Garbett, S. P., J. Stephens, K. Simonov, and others. 2022. yaml: Methods to Convert
R Data to YAML and Back,.
Garnier, Simon, Ross, and others. 2021. viridis - Colorblind-Friendly Color Maps for
R,.
Greenwell, B. 2021. fastshap: Fast Approximate Shapley Values,.
Grolemund, G., and H. Wickham. 2011. Dates and Times Made Easy with lubridate. J.
Stat. Softw. 40: 1–25.
Jones, J. W. 2019. Improved Automated Detection of Subpixel-Scale Inundation—Revised
Dynamic Surface Water Extent (DSWE) Partial Surface Water Tests. Remote Sens. 11: 374.
doi:10.3390/rs11040374
Jӓrnefelt, H. 1925. Zur Limnologie einiger Gewӓsser Finnlands. Soc Zool Bot Fenn.
Vanamo 2: 185–352.
Kassambara, A. 2020. ggpubr: “ggplot2” Based Publication Ready Plots,.
Kuhn, M. 2022. caret: Classification and Regression Training,.
Landau, W. M. 2021. The targets R package: a dynamic Make-like function-oriented
pipeline toolkit for reproducibility and high-performance computing. J. Open Source
Softw. 6: 2959.
Leech, D. M., A. I. Pollard, S. G. Labou, and S. E. Hampton. 2018. Fewer blue lakes
and more murky lakes across the continental U.S.: Implications for planktonic food webs.
Limnol. Oceanogr. 63: 2661–2680. doi:10.1002/lno.10967
Lundberg, S. M., and S.-I. Lee. 2017. A Unified Approach to Interpreting Model
Predictions. Advances in Neural Information Processing Systems. Curran Associates,
Inc.
Messager, M. L., B. Lehner, G. Grill, I. Nedeva, and O. Schmitt. 2016. Estimating
the volume and age of water stored in global lakes using a geo-statistical approach.
Nat. Commun. 7: 13603. doi:10.1038/ncomms13603
Meyer, H., C. Milà, and M. Ludwig. 2022. CAST: “caret” Applications for
Spatial-Temporal Models,.
Naumann, E. 1917. Undersӧkningar ӧver fytoplankton och under den pelagiska regionen
fӧsiggående gyttje-och dybildningar inom vissa syd- och mellansvenska urbergsvatten. K
Sv Vetensk Akad Handl 56: 1–165.
Nürnberg, G. K., and M. Shaw. 1998. Productivity of clear and humic lakes:
nutrients, phytoplankton, bacteria. Hydrobiologia 382: 97–112.
doi:10.1023/A:1003445406964
Pebesma, E. 2018. Simple Features for R: Standardized Support for Spatial Vector
Data. R J. 10: 439–446. doi:10.32614/RJ-2018-009
Pedersen, T. L. 2022. ggforce: Accelerating “ggplot2,.”
Pohlert, T. 2020. trend: Non-Parametric Trend Tests and Change-Point
Detection,.
R Core Team. 2022. R: A Language and Environment for Statistical Computing, R
Foundation for Statistical Computing.
Robinson, N., J. Regetz, and R. P. Guralnick. 2014. EarthEnv-DEM90: A nearly-global,
void-free, multi-scale smoothed, 90m digital elevation model from fused ASTER and SRTM
data. ISPRS J. Photogramm. Remote Sens. 87: 57–67.
doi:10.1016/j.isprsjprs.2013.11.002
Rohde, W. 1969. Crystallization of Eutrophication Concepts in Northern Europe, p.
20256. In Eutrophication: Causes, Consequences, Correctives. National Academies
Press.
Rosenblatt, F. 1958. The perceptron: A probabilistic model for information storage
and organization in the brain. Psychol. Rev. 65: 386–408. doi:10.1037/h0042519
Shapley, L. S. 1953. 17. A Value for n-Person Games, p. 307–318. In H.W. Kuhn and
A.W. Tucker [eds.], Contributions to the Theory of Games (AM-28), Volume II. Princeton
University Press.
Shen, Z., X. Yu, Y. Sheng, J. Li, and J. Luo. 2015. A Fast Algorithm to Estimate the
Deepest Points of Lakes for Regional Lake Registration. PLOS ONE 10: e0144700.
doi:10.1371/journal.pone.0144700
Štrumbelj, E., and I. Kononenko. 2014. Explaining prediction models and individual
predictions with feature contributions. Knowl. Inf. Syst. 41: 647–665.
doi:10.1007/s10115-013-0679-x
Thienemann, A. 1921. Seetypen. Naturwissenschaften 9.
Topp, S., T. Pavelsky, X. Yang, J. Gardner, and M. R. V. Ross. 2020. LimnoSat-US: A
Remote Sensing Dataset for U.S. Lakes from 1984-2020.doi:10.5281/zenodo.4139695
USEPA. 1987. Handbook of Methods for Acid Deposition Studies: Laboratory Analyses
for Surface Water Chemistry, U.S. Environmental Protection Agency, Office of Research
and Development.
USEPA. 2007. Survey of the Nation’s Lakes. Field Operations Manual. EPA 841-B-07004.
EPA 841-B-07004 U.S. Environemtnal Protection Agency.
USEPA. 2011. 2012 National Lakes Assessment. Field Operations Manual. EPA
841-B-11-003. EPA 841-B-11-003 U.S. Environemtnal Protection Agency.
USEPA. 2012. National Lakes Assessment. Laboratory Operations Manual.
EPA-841-B-11-004. EPA-841-B-11-004 U.S. Environemtnal Protection Agency.
USEPA. 2017a. National Lakes Assessment 2017. Field Operations Manual. EPA
841-B-16-002. EPA 841-B-16-002 U.S. Environemtnal Protection Agency.
USEPA. 2017b. National Lakes Assessment 2017. Laboratory Operations Manual. V.1.1.
EPA 841‐B‐16‐ 004. EPA 841‐B‐16‐ 004 U.S. Environemtnal Protection Agency.
Ushey, K., J. J. Allaire, and Y. Tang. 2022. reticulate: Interface to
“Python,.”
Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with S, Fourth.
Springer.
Webster, K. E., P. A. Soranno, K. S. Cheruvelil, and others. 2008. An empirical
evaluation of the nutrient-color paradigm for lakes. Limnol. Oceanogr. 53: 1137–1148.
doi:10.4319/lo.2008.53.3.1137
Wei, R., and J. Wang. 2018. multiROC: Calculating and Visualizing ROC and PR Curves
Across Multi-Class Classifications,.
Wetzel, R. G. 2001. Limnology: Lake and River Ecosystems, 3rd edition. Academic
Press.
Wickham, H., M. Averick, J. Bryan, and others. 2019. Welcome to the tidyverse. J.
Open Source Softw. 4: 1686. doi:10.21105/joss.01686
Wilke, C. O., and B. M. Wiernik. 2022. ggtext: Improved Text Rendering Support for
“ggplot2,.”
Willard, J. D., J. S. Read, A. P. Appling, S. K. Oliver, X. Jia, and V. Kumar. 2021.
Predicting Water Temperature Dynamics of Unmonitored Lakes With Meta-Transfer Learning.
Water Resour. Res. 57: e2021WR029579. doi:10.1029/2021WR029579
Williamson, C. E., D. P. Morris, M. L. Pace, and O. G. Olson. 1999. Dissolved
organic carbon and nutrients as regulators of lake ecosystems: Resurrection of a more
integrated paradigm. Limnol. Oceanogr. 44: 795–803.
doi:10.4319/lo.1999.44.3_part_2.0795