We synthesized data from a total of 827 widespread stratified temperate lakes. Data were collated from Jane et al. (2021), the U.S. Wisconsin Department of Natural Resources (DNR), the U.S. New Hampshire Volunteer Lake Assessment Program (VLAP), the U.S. Lake Stewards of Maine (LSM) Volunteer Lake Monitoring Program, the U.S. Adirondack lakes database (Winslow et al., 2018), and solicited from members of the Global Lake Ecological Observatory Network (GLEON). For each site, we synthesized available data for dissolved oxygen (DO), water temperature, chlorophyll-a (chl-a), total phosphorus (TP), total nitrogen (TN), and dissolved organic carbon (DOC), as well as lake metadata including depth (mean and maximum), surface area, and elevation. We collated Area:Depth relationships from 19 lakes, and modeled bathymetric profiles from the remaining lakes using mean and maximum depth and surface area. Data availability and collection methods differed among sites in this database (documented in methods.csv).
In sum, the complete datasets consist of 140,813 distinct sampling events at 822 lakes. Median data duration at each lake is 29 years (range 5–102). Lakes in the dataset have a median depth of 12.5 m (range 1.5–480 m), median surface area of 85.4 ha (range: 0.5–237000 ha), and median elevation of 264 m (range: -215–2804). The lakes are located in 18 countries across 5 continents, with latitudes ranging from -42.6 to 68.3.
Quality control
We performed basic quality control on the collated data. First, we added flag columns for DO, TP, TN, chl-a, and DOC. In the flag columns, we indicated whether data were missing (Flag = 2) for all variables. Next, we set very slightly negative values (between -1 and 1 for all variables) to 0, as these values likely indicate marginal calibration error, and we added a flag to indicate this modification (Flag = 3). Then, we removed impossible values for each parameter (Flag = 4), which we define as follows:
Dissolved oxygen: less than -1 or greater than 40, following Jane et al. (2021)
Temperature: less than -1 or greater than 40, following Jane et al. (2021)
DOC: less than -1
TP: less than -1
TN: less than -1
Chlorophyll-a: less than -1
We removed all measurements with negative depth values, as this dataset is limited to in-water lake data.
Hydrolakes
HydroLAKES is a global database of 1.4 million lakes with surface area ≥10 ha (Messager et al., 2016). HydroLAKES provides metadata including mean and maximum depths, catchment area, average residence time, and lake type (lake vs. reservoir), and interoperates with multiple other databases (e.g., hydroBASINS, the Global Reservoir and Dam database) using unique lake IDs. We manually matched each lake in our dataset to hydroLAKES lake IDs. By linking hydroLAKES IDs with our published data, we aimed to increase interoperability of these two large-scale datasets.
For lakes with missing mean or maximum depth (i.e., the depths were not reported with the data; n = 57), we used HydroLAKES data to fill in these values.
Lake IDs
Our LakeID designations interface with the Maine, Wisconsin, and New Hampshire lake monitoring datasets, as well as Stetler et al. (2021). Here, Maine lake IDs are prefixed with “MIDAS-”, Wisconsin lake IDs are prefixed with “WI-”, and New Hampshire lake IDs are kept with their original alphanumeric lake ID (begins with “NH”). Numeric lake IDs are from Stetler et al. (2021). All other lakes were contributed directly to this study and are prefixed with “aba”.