We synthesized data from a total of 822 widespread stratified temperate lakes. Data were collated from Jane et al. (2021), the U.S. Wisconsin Department of Natural Resources (DNR), the U.S. New Hampshire Volunteer Lake Assessment Program (VLAP), the U.S. Lake Stewards of Maine (LSM) Volunteer Lake Monitoring Program, the U.S. Adirondack lakes database (Winslow et al. 2018; Leach et al. 2018), and solicited from members of the Global Lake Ecological Observatory Network (GLEON). For each site, we synthesized available data for dissolved oxygen (DO), water temperature, chlorophyll-a (chl-a), total phosphorus (TP), total nitrogen (TN), and dissolved organic carbon (DOC), as well as lake metadata including depth (mean and maximum), surface area, and elevation. We collated Area:Depth relationships from 19 lakes, and modeled bathymetric profiles from the remaining lakes using mean and maximum depth and surface area. Data availability and collection methods differed among sites in this database (documented in methods.csv).
In sum, the complete datasets consist of 140,813 distinct sampling events at 822 lakes. Median data duration at each lake is 29 years (range 5–102). Lakes in the dataset have a median depth of 12.5 m (range 1.5–480 m), median surface area of 85.4 ha (range: 0.5–237000 ha), and median elevation of 264 m (range: -215–2804). The lakes are located in 18 countries across 5 continents, with latitudes ranging from -42.6 to 68.3.
QUALITY CONTROL
We performed basic quality control on the collated data. First, we added flag columns for DO, TP, TN, chl-a, and DOC. In the flag columns, we indicated whether data were missing (Flag = 2) for all variables. Next, we set very slightly negative values (between -1 and 1 for all variables) to 0, as these values likely indicate marginal calibration error, and we added a flag to indicate this modification (Flag = 3). Then, we removed impossible values for each parameter (Flag = 4), which we define as follows:
Dissolved oxygen: less than -1 or greater than 40, following Jane et al. (2021)
Temperature: less than -1 or greater than 40, following Jane et al. (2021)
DOC: less than -1
TP: less than -1
TN: less than -1
Chlorophyll-a: less than -1
We removed all measurements with negative depth values, as this dataset is limited to in-water lake data.
HYDROLAKES
HydroLAKES is a global database of 1.4 million lakes with surface area ≥10 ha (Messager et al., 2016). HydroLAKES provides metadata including mean and maximum depths, catchment area, average residence time, and lake type (lake vs. reservoir), and interoperates with multiple other databases (e.g., hydroBASINS, the Global Reservoir and Dam database) using unique lake IDs. We manually matched each lake in our dataset to hydroLAKES lake IDs. By linking hydroLAKES IDs with our published data, we aimed to increase interoperability of these two large-scale datasets.
For lakes with missing mean or maximum depth (i.e., the depths were not reported with the data; n = 57), we used HydroLAKES data to fill in these values.
BATHYMETRY
Our dataset includes measured hypsometry (area and depth intervals; e.g., using acoustic doppler current profiler data) for a small subset of lakes (n = 20; contributed_bathymetry.csv). For these lakes, we interpolated from the measured bathymetric contours to 1 m intervals using a cubic spline function (Forsythe et al., 1977). Volume of each interval was calculated assuming a cylinder of 1 m height and surface area equal to the average of lake area above and below this interval. For all other lakes, we modeled bathymetric contours using maximum depth, mean depth, and surface area following Håkanson (2005). Briefly, a “shape-factor” was calculated based upon the ratio of mean to maximum depth, and this shape factor was used to estimate lake area in 1 m intervals from the surface to the bottom of the lake. Volume of each interval was then calculated as above. Code for these calculations is presented in model_bathymetries.R and modeled bathymetry outputs are presented in bathymetry_EDI.csv.
LAKE IDS
Our LakeID designations interface with the Maine, Wisconsin, and New Hampshire lake monitoring datasets, as well as Stetler et al. (2021). Here, Maine lake IDs are prefixed with “MIDAS-”, Wisconsin lake IDs are prefixed with “WI-”, and New Hampshire lake IDs are kept with their original alphanumeric lake ID (begins with “NH”). Numeric lake IDs are from Stetler et al. (2021). All other lakes were contributed directly to this study and are prefixed with “aba”.
REFERENCES
Forsythe, G. E., Malcolm, M. A., & Moler, C. B. (1977). Computer Methods for Mathematical Computations. Prentice Hall Professional Technical Reference.
Håkanson, L. (2005). The Importance of Lake Morphometry for the Structure and Function of Lakes. International Review of Hydrobiology, 90(4), 433–461. https://doi.org/10.1002/iroh.200410775
Jane, S. F., G. Hansen, B. Kraemer, and others. 2021. Widespread deoxygenation of temperate lakes. Nature 594. doi:10.1038/s41586-021-03550-y
Leach, T. H., L. A. Winslow, F. W. Acker, and others. 2018. Long-term dataset on aquatic responses to concurrent climate change and recovery from acidification. Sci Data 5: 180059. doi:10.1038/sdata.2018.59
Messager, M. L., B. Lehner, G. Grill, I. Nedeva, and O. Schmitt. 2016. Estimating the volume and age of water stored in global lakes using a geo-statistical approach. Nat Commun 7: 13603. doi:10.1038/ncomms13603
Stetler, J. T., S. F. Jane, J. L. Mincer, M. N. Sanders, and K. C. Rose. 2021. Long-term lake dissolved oxygen and temperature data, 1941-2018. doi:10.6073/PASTA/C45EFE4826B5F615023B857DC59856F3
Winslow, L., T. Leach, and T. Hahn. 2018. adklakedata: Adirondack Long-Term Lake Data.