GENERATION OF NEAR-TERM G. ECHINULATA DENSITY HINDCASTS
Detailed methods can be found in Lofton et al. 20XX. Briefly, fourteen Bayesian state-space models of varying complexity and including different environmental covariates were calibrated using environmental driver data and G. echinulata density data at a nearshore site (South Herrick Cove) in Lake Sunapee from May-October in 2009-2014, and then validated by generating one-week-ahead to four-week-ahead hindcasts of G. echinulata density from May-October in 2015-2016. For hindcasts, G. echinulata and environmental driver data were assimilated weekly by re-running the model calibration each week to obtain updated estimates of model parameters and latent states. These updated posteriors were then used to run the model forward in time four weeks to generate G. echinulata density hindcasts. Environmental driver data were hindcasted using draws from an ensemble of historical values during 2009-2014 for the 2015 hindcasts, and from 2009-2015 for the 2016 hindcasts.
Hindcasts were generated under several different conditions to allow for subsequent uncertainty partitioning of the total hindcast variance and calculation of credible and predictive intervals. The following sources of uncertainty were considered: initial conditions uncertainty, parameter uncertainty, driver data uncertainty, process uncertainty, and observation uncertainty. First, hindcasts were generated including only initial conditions uncertainty, or uncertainty in the latent state of G. echinulata. Next, we added in parameter uncertainty, or uncertainty in the value of model parameters. After that, we added in driver uncertainty, or uncertainty in the hindcasted value of environmental covariates. Finally, we added in process uncertainty, or uncertainty due to stochasticity, error in model structure, or numerical rounding error during the course of a model run. Together, these four sources of uncertainty (initial conditions, parameter, driver, and process) constitute the hindcast credible interval. We also generated hindcasts that included all of the aforementioned uncertainty sources plus observation error to be able to generate predictive intervals for use when comparing model hindcasts to observational data.
NAMING CONVENTION FOR HINDCAST FILES
Within the provided .zip file (Gechinulata_hindcasts.zip), there are 60 .nc files, each of which corresponds to a hindcast for 40 weeks during the sampling seasons of 2015-2016 generated by one model and including a specified subset of uncertainty sources. There are also 60 Ecological Metadata Language (EML) files, one for each hindcast NetCDF file, which provide metadata for each hindcast according to standards recommended by the Ecological Forecasting Initiative (EFI; ecoforecast.org) that were current as of the date of this data publication (21MAR21; see https://github.com/eco4cast/EFIstandards). We note these standards are still under development.
The following naming convention for .nc files was used:
ModelName_uncertainty.sources.nc
ModelName indicates one of the following models used to generate the hindcast:
RW, OffsetRW, AC, BaseLM, MinWaterTemp, MinWaterTempLag, WaterTempMA, SchmidtLag, WindDir, GDD, SchmidtAndWind, TempAndWind, WindAndGDD.
Descriptions of the structure for each model can be found in Table 2 of Lofton et al. 20XX.
uncertainty.sources indicates the combination of uncertainty sources that are incorporated in the hindcast file, according to the following codes:
IC = initial conditions; Pa = parameter; D = driver; P = process; O = observation
Note that not model structures contain all sources of uncertainty. For example, a random walk model does not have driver uncertainty because it does not include any environmental covariates.
Hindcasts run from one to four weeks into the future from the date for which they were generated. For example, a hindcast generated for the week of 2015-05-25 will include a one-week-ahead forecast for the week of 2015-06-01, a two-week-ahead forecast for the week of 2015-06-08, a three-week-ahead forecast for the week of 2015-06-15, and a four-week-ahead forecast for the week of 2015-06-22.
DOWNLOAD AND PROCESSING OF NLDAS-2 DATA
The NLDAS-2 database (https://ldas.gsfc.nasa.gov/nldas/v2/forcing) was accessed in February 2017 and data were downloaded from January 1, 1979 through December 31, 2016 at the hourly scale, including shortwave radiation. Detailed definitions and descriptions of the NLDAS-2 forcing variables can be found at https://ldas.gsfc.nasa.gov/nldas/v2/forcing. Lake Sunapee spans four 1/8th-degree grid cells within the NLDAS grid system and these grid cells were queried for download using a Lake Sunapee shapefile. Observations for each meteorological variable were subsequently averaged across grid cells to provide a single value for the lake at each hourly timestep.
Hourly solar radiation values were subsequently summarized to daily mean, median, maximum, minimum, standard deviation, and sum for each day of G. echinulata sampling from 2009-2016.
DOWNLOAD AND PROCESSING OF PRISM DATA
The PRISM database (http://www.prism.oregonstate.edu/documents/PRISM_datasets.pdf) was accessed on October 4, 2018, and data from the AN81d dataset were downloaded from January 1, 1981 through December 31, 2017, including precipitation, with grid cell interpolation on. Detailed definitions and descriptions of PRISM datasets can be found at http://www.prism.oregonstate.edu/documents/PRISM_datasets.pdf. Data were downloaded for a location corresponding to a Gloeotrichia echinulata monitoring site on Lake Sunapee (Lat: 43.4098, Lon: -72.0367, Elev: 361m). Data were downloaded at the daily timestep, which PRISM defines as the 24 hour period ending at 1200 UTC on the day entered in the Date column of the dataframe.
Precipitation data were subsequently summarized to include daily sum of precipitation on each of day of G. echinulata sampling from 2009-2016, as well as daily sum of precipitation on the day prior to the day of sampling (precip_mm_1daylag) and daily sum of precipitation on the previous G. echinulata sampling day (precip_mm_1weeklag).
CITATIONS
Lofton, M.E., Brentrup, J.A., Beck, W.S., Zwart, J.A., Bhattacharya, R., Brighenti, L.S., Burnet, S.H., McCullough, I.M., Steele, B.G., Carey, C.C., Cottingham, K.L., Dietze, M.C., Ewing, H.A., Weathers, K.D., LaDeau, S.L. 20XX. Using near-term forecasts and uncertainty partitioning to improve prediction of oligotrophic lake cyanobacterial density. Journal, Volume, Issue, Pages.