We synthesized data from a total of 827 widespread stratified temperate lakes. Data were collated from Jane et al. (2021), the U.S. Wisconsin Department of Natural Resources (DNR), the U.S. New Hampshire Volunteer Lake Assessment Program (VLAP), the U.S. Lake Stewards of Maine (LSM) Volunteer Lake Monitoring Program, the U.S. Adirondack lakes database (Winslow et al., 2018) and solicited from members of the Global Lake Ecological Observatory Network (GLEON). For each site, we synthesized available data for dissolved oxygen (DO), water temperature, chlorophyll-a (chl-a), total phosphorus (TP), total nitrogen (TN) and dissolved organic carbon (DOC), as well as lake metadata including depth (mean and maximum), surface area, and elevation. We collated Area:Depth relationships from 19 lakes, and modeled bathymetric profiles from the remaining lakes using mean and maximum depth and surface area. Data availability and collection methods differed among sites in this database (documented in methods.csv).
In sum, the complete datasets consist of 111,903 distinct sampling events at 671 lakes. Median data duration at each lake is 29 years (range 2–80). Lakes in the dataset have a median depth of 14.0 m (range 6.4–370 m), median surface area of 95.1 ha (range: 1.1–126909 ha) and median elevation of 242 m (range: -215–2804). The lakes are located in 20 countries across 5 continents, with latitudes ranging from -42.6 to 68.3.
Quality control
We performed basic quality control on the collated data. First, we added flag columns for DO, TP, TN, chl-a, and DOC. In the flag columns, we indicated whether data were missing (Flag = 2) for all variables. Next, we set very slightly negative values (between -1 and 1 for all variables) to 0, as these values likely indicate marginal calibration error, and we added a flag to indicate this modification (Flag = 3). Then, we removed impossible values for each parameter (Flag = 4), which we define as follows:
Dissolved oxygen: less than -1 or greater than 40, following Jane et al. (2021)
Temperature: less than -1 or greater than 40, following Jane et al. (2021)
DOC: less than -1
TP: less than -1
TN: less than -1
Chlorophyll-a: less than -1
We removed all measurements with negative depth values, as this dataset is limited to in-water lake data.
Hydrolakes
HydroLAKES provides metadata including mean and maximum depths, catchment area, average residence time, and lake type (lake vs. reservoir; REF), and the hydroLAKES database interoperates with multiple other databases (e.g., hydroBASINS, the Global Reservoir and Dam database) using unique lake IDs. We manually matched each lake in our dataset to hydroLAKES lake IDs by visualizing the geographic locations of each lake in our dataset in combination with the geographic locations of hydroLAKES lakes (overlaid on satellite imagery). By linking hydroLAKES IDs with our published data, we aimed to increase interoperability of these two large-scale datasets. To add additional metadata to this dataset, we linked our lake
For n = 90 of the lakes in this dataset, mean depth was unknown. For these lakes, we acquired mean depth values from the hydroLAKES database to fill in our lake metadata.
Lake IDs
Our LakeID designations interface with the Maine, Wisconsin, and New Hampshire lake monitoring datasets, as well as Stetler et al. (2021). Maine lake IDs are prefixed with “MIDAS-”, Wisconsin lake IDs are prefixed with “WI-”, and New Hampshire lake IDs are kept with their original alphanumeric lake ID (begins with “NH”). Numeric lake IDs are from Stetler et al. (2021). All other lakes were contributed directly to this study and are prefixed with “aba”.