Data Package Metadata   View Summary

Global lake area, climate, and population dataset

General Information
Data Package:
Local Identifier:edi.394.4
Title:Global lake area, climate, and population dataset
Alternate Identifier:DOI PLACE HOLDER
Abstract:

An increasing population in conjunction with a changing climate necessitates a detailed understanding of water abundance at multiple spatial and temporal scales. While remote sensing has provided massive data volumes to track fluctuations in water quantity, contextualizing water abundance with other local, regional, and global trends remains challenging, because of requirements for large computational resources to combine multiple data sources into an analytically-friendly format. To bridge this gap and to facilitate future freshwater research opportunities, we present the Global lake area, climate, and population (GLCP) dataset. The GLCP is a compilation of lake surface area for over 1.42 million lakes and reservoirs from 1995 to 2015 with co-located basin-level temperature, precipitation, and population data. The GLCP was created with FAIR data principles in mind by retaining unique identifiers from parent datasets to expedite interoperability. The GLCP offers critical data for basic and applied investigations of lake surface area, and water quantity in general, at local, regional, and global scales.

Publication Date:2019-11-15

Time Period
Begin:
1995-01-01
End:
2015-10-31

People and Organizations
Contact:Labou, Stephanie G (Center for Environmental Research, Education, & Outreach, Washington State University) 
Contact:Meyer, Michael F (School of the Environment, Washington State University) [  email ]
Contact:Brousil, Matthew R (Center for Environmental Research, Education, & Outreach, Washington State University) [  email ]
Contact:Cramer, Alli N (School of the Environment, Washington State University) [  email ]
Contact:Luff, Bradley T (School of the Environment, Washington State University) [  email ]
Creator:Labou, Stephanie G (Center for Environmental Research, Education, & Outreach, Washington State University)
Creator:Meyer, Michael F (School of the Environment, Washington State University)
Creator:Brousil, Matthew R (Center for Environmental Research, Education, & Outreach, Washington State University)
Creator:Cramer, Alli N (School of the Environment, Washington State University)
Creator:Luff, Bradley T (School of the Environment, Washington State University)

Data Entities
Data Table Name:
lake area, climate, and population data
Description:
lake area, climate, and population data
Data Table Name:
data availability metrics
Description:
data availability metrics
Other Name:
Combines GLCP with data quality metrics
Description:
Combines GLCP with data quality metrics
Detailed Metadata

Data Entities


Data Table

Data:https://pasta-s.lternet.edu/package/data/eml/edi/394/4/2a8af5e32978d03d67e91fc3dbb72475
Name:lake area, climate, and population data
Description:lake area, climate, and population data
Number of Records:2000
Number of Columns:15

Table Structure
Object Name:glcp.csv
Size:381000 bytes
Authentication:1e3e27c1f21eec4492baf2f66e9956fc Calculated By MD5
Text Format:
Number of Header Lines:1
Record Delimiter:\r\n
Orientation:column
Simple Delimited:
Field Delimiter:,
Quote Character:"

Table Column Descriptions
 
Column Name:year  
hylak_id  
centr_lat  
centr_lon  
continent  
country  
bsn_lvl  
HYBAS_ID  
mean_monthly_precip_mm  
total_precip_mm  
mean_annual_temp_k  
pop_sum  
seasonal_km2  
permanent_km2  
total_km2  
Definition:Year, spans 1995-2015. Note that for the purposes of these data, 2015 ends on October 31.HydroLAKES unique identifier of lake. Preserved from HydroLAKES input data to enable future merge with HydroLAKES attributes.Lake centroid latitude.Lake centroid longitude.Continent on which lake is located (from HydroLAKES dataset).Country in which lake is located (from HydroLAKES dataset).Pfafstetter level of basin associated with lake.HydroBASINS unique identifier of basin associated with lake. Preserved from HydroBASINS input data to enable future merge with HydroBASINS attributes.Mean monthly basin-level precipitationAnnually accumulated basin-level precipitationMean annual basin-level temperatureTotal basin-level human population. Note that this column only has valid values for 1995, 2000, 2005, 2010, and 2015.Water area of seasonal water, as defined by Pekel et al. 2016.Water area of permanent water, as defined by Pekel et al. 2016Calculated total water as the sum of seasonal and permanent water.
Storage Type:float  
string  
float  
float  
string  
string  
string  
string  
float  
float  
float  
float  
float  
float  
float  
Measurement Type:rationominalratiorationominalnominalnominalnominalratioratioratioratioratioratioratio
Measurement Values Domain:
UnitnominalYear
Typenatural
Min1995 
Max2015 
DefinitionHydroLAKES unique identifier of lake. Preserved from HydroLAKES input data to enable future merge with HydroLAKES attributes.
Unitdegree
Typereal
Min-50.22 
Max74.54 
Unitdegree
Typereal
Min-160.69 
Max111.29 
DefinitionContinent on which lake is located (from HydroLAKES dataset).
DefinitionCountry in which lake is located (from HydroLAKES dataset).
DefinitionPfafstetter level of basin associated with lake.
DefinitionHydroBASINS unique identifier of basin associated with lake. Preserved from HydroBASINS input data to enable future merge with HydroBASINS attributes.
Unitmillimeter
Typereal
Min3.19 
Max230.82 
Unitmillimeter
Typereal
Min59798.49 
Max465717963.48 
Unitkelvin
Typereal
Min258.3 
Max301.73 
Unitnumber
Typereal
Min
Max123176226.41 
UnitsquareKilometers
Typereal
Min
Max4861.72 
UnitsquareKilometers
Typereal
Min
Max354482.3 
UnitsquareKilometers
Typereal
Min
Max359344.03 
Missing Value Code:
CodeNA
Explnot available
CodeNA
Explnot available
CodeNA
Explnot available
CodeNA
Explnot available
CodeNA
Explnot available
CodeNA
Explnot available
CodeNA
Explnot available
CodeNA
Explnot available
CodeNA
Explnot available
CodeNA
Explnot available
CodeNA
Explnot available
CodeNA
Explnot available
CodeNA
Explnot available
CodeNA
Explnot available
CodeNA
Explnot available
Accuracy Report:                              
Accuracy Assessment:                              
Coverage:                              
Methods:                              

Data Table

Data:https://pasta-s.lternet.edu/package/data/eml/edi/394/4/4edc9abfd42456fdb9ebbf57c0403488
Name:data availability metrics
Description:data availability metrics
Number of Records:2000
Number of Columns:8

Table Structure
Object Name:JRC_all_no_data_proportions_yearly_95thru15.csv
Size:204720 bytes
Authentication:d9d9256d1cf9858f58fa06b990c9d498 Calculated By MD5
Text Format:
Number of Header Lines:1
Record Delimiter:\r\n
Orientation:column
Simple Delimited:
Field Delimiter:,
Quote Character:"

Table Column Descriptions
 
Column Name:hylak_id  
year  
no_obvs_km2  
not_water_km2  
no_data_to_not_water  
no_data_to_seasonal  
no_data_to_permanent  
no_data_to_total  
Definition:HydroLAKES unique identifier of lake. Preserved from HydroLAKES input data to enable future merge with HydroLAKES attributes.Year, spans 1995-2015. Note that for the purposes of these data, 2015 ends on October 31.Area of no data, as defined by Pekel et al. 2016.Area of not water, as defined by Pekel et al. 2016.Calculated ratio of no_data_km2 derived area to not_water_km2 derived area.Calculated ratio of no_data_km2 derived area to seasonal_km2 derived area.Calculated ratio of no_data_km2 derived area to permanent_km2 derived area.Calculated ratio of no_data_km2 derived area to total_km2 calculated area.
Storage Type:string  
float  
float  
float  
float  
float  
float  
float  
Measurement Type:nominalratioratioratioratioratioratioratio
Measurement Values Domain:
DefinitionHydroLAKES unique identifier of lake. Preserved from HydroLAKES input data to enable future merge with HydroLAKES attributes.
UnitnominalYear
Typenatural
Min1995 
Max1995 
UnitsquareKilometers
Typereal
Min
Max172684.58 
UnitsquareKilometers
Typereal
Min
Max3472.51 
Unitnumber
Typereal
Min
Max375014.82 
Unitnumber
Typereal
Min
Max358456.45 
Unitnumber
Typereal
Min
Max958982.9 
Unitnumber
Typereal
Min
Max845491.83 
Missing Value Code:        
CodeInf
Explinfinitive
CodeNA
ExplNot Available
CodeInf
Explinfinitive
CodeNA
ExplNot Available
CodeInf
Explinfinitive
CodeNA
ExplNot Available
CodeInf
Explinfinitive
CodeNA
ExplNot Available
Accuracy Report:                
Accuracy Assessment:                
Coverage:                
Methods:                

Non-Categorized Data Resource

Name:Combines GLCP with data quality metrics
Entity Type:unknown
Description:Combines GLCP with data quality metrics
Physical Structure Description:
Object Name:combine_data_availability_metrics_with_glcp.R
Size:794 bytes
Authentication:a07e4b1d585e5a33ec2771404aa0bc77 Calculated By MD5
Externally Defined Format:
Format Name:unknown
Data:https://pasta-s.lternet.edu/package/data/eml/edi/394/4/8f67f4bb47b93928b3ac094fee69324a

Data Package Usage Rights

This information is released under the Creative Commons license - Attribution - CC BY (https://creativecommons.org/licenses/by/4.0/). The consumer of these data ("Data User" herein) is required to cite it appropriately in any publication that results from its use. The Data User should realize that these data may be actively used by others for ongoing research and that coordination may be necessary to prevent duplicate publication. The Data User is urged to contact the authors of these data if any questions about methodology or results occur. Where appropriate, the Data User is encouraged to consider collaboration or co-authorship with the authors. The Data User should realize that misinterpretation of data may occur if used out of context of the original study. While substantial efforts are made to ensure the accuracy of data and associated documentation, complete accuracy of data sets cannot be guaranteed. All data are made available "as is." The Data User should be aware, however, that data are updated periodically and it is the responsibility of the Data User to check for new versions of the data. The data authors and the repository where these data were obtained shall not be liable for damages resulting from any use or misinterpretation of the data. Thank you.

Keywords

By Thesaurus:
(No thesaurus)Hydrology, lentic systems, environmental synthesis
LTER Controlled Vocabularylake

Methods and Protocols

These methods, instrumentation and/or protocols apply to all data in this dataset:

Methods and protocols used in the collection of this data package
Description:

In order to harmonize several disparate datasets, we implemented the workflow described below. For a more complete description of our workflow and quality control measures, please see the manuscript Meyer and Labou, et al. (Under Review).

The authors recommend that future users implement similar packages used in this workflow for manipulating the GLCP. In particular, loading the GLCP into certain environments, such as R, often require the use of external packages. Most notably, the fread() function from the data.table package (Dowle and Srinivasan, 2019) can quickly read the GLCP into the R environment through the command:

glcp = fread(x = "glcp.csv", header = TRUE, integer64 = "character").

Data Sources

Lake locations and boundaries

For the locations of lakes, we used the HydroLAKES database version 1.0 (Messager et al., 2016), which incorporates multiple lake datasets (e.g., SRTM Water Body Data, Global Lakes and Wetlands Database) and includes 1,427,688 lakes of at least 10 hectares in surface area. The majority of HydroLAKES lakes are defined as uncontrolled lakes (99.5%), with the remainder identified as reservoirs (0.47%) and controlled lakes (0.03%). HydroLAKES, which is available in the form of shapefiles, includes an extensive number of attributes for lake polygons including: lake surface area (polygon area),elevation, shoreline development, total volume, average depth, residence time, latitude and longitude of pour point, lake type, and others. The HydroLAKES v1.0 identifier ("Hylak_id") is retained in the GLCP to facilitate future work making use of other attributes in the HydroLAKES data, which are not included in the GLCP. Additionally, the GLCP also contains lake centroids, which were calculated within ArcGIS version 3.1 (ESRI, 2015).

Hereafter, HydroLAKES lake polygons are referred to as "lakes."

Basins

Because lakes are products of the landscapes in which they reside, we calculated climate and population values relative to each lake's basin. To identify basin boundaries, we used the HydroBASINS dataset, a basin-level analog to the HydroLAKES dataset. The HydroBASINS version 1.c format 1 database (Lehner and Grill, 2013) is derived from the HydroSHEDS database (Lehner et al., 2008), which uses 15 arc-second resolution data to identify river basins, watersheds, and sub-basins globally. In HydroBASINS, basins are identified using the Pfafstetter coding system, with Level 1 as the highest level (i.e., continent level) and Level 12 as the smallest available sub-basin. Table 1 details the number and median size of basins within each Pfafstetter level for basins used within the GLCP. We retain the original HydroBASINS version 1.c identifier ("HYBAS_ID") for each basin in the GLCP, for ease of future integration with existing HydroBASINS attributes, such as distance from basin outlet to next downstream sink and indicators of endorheic basins.

Hereafter, HydroBASINS polygons are referred to as "basins."

Surface water extent

For changes in lake surface water area over time, we used the Joint Research Centre (JRC) Global Surface Water Dataset described in Pekel et al. (2016), which used LANDSAT imagery (30 meter resolution) from March 1984 through October 2015 to identify changes in surface water area for lakes, rivers, streams, and wetlands. The data are hosted by the European Commission JRC and are formally referred to as the Global Surface Water Dataset. Hereafter, we use the abbreviation "JRC" to refer to this dataset.

The JRC data subsetted for Yearly Water Classification History v1.0 (1984-2015) are publicly available through Google Earth Engine as annually aggregated raster images. Each image contains a "waterClass" band with the following values: 0 = no observations, 1 = not water, 2 = seasonal water (defined as water that is present for at least one month but not an entire year), 3 = permanent water (defined as water that is present for all twelve months). For more detailed information on the complete LANDSAT processing workflow used to create the JRC, Pekel et al. (2016) provides a methodology of how waterClasses were assigned based on raw LANDSAT data.

While the JRC dataset is the most extensive global surface water dataset available to date, it is limited by the LANDSAT data from which it is derived. Even though LANDSAT coverage began in 1984, portions of northeastern Siberia (Kolyma region and Central Siberian plateau) as well as central Greenland were not included in totality until 1999.

Additionally, the JRC is limited by the water identification algorithm used by Pekel et al. (2016), which divided pixels into water, land, or non-valid observations, where non-valid observations may include snow and ice. This system therefore does not classify permanently frozen lakes as water. Seasonally frozen lakes, however, would be coded as entirely seasonal water.

Population estimates

We used the Gridded Population of the World (GPW) version 3 (CIESIN) for 1995 and GPW version 4 (CIESIN) un-adjusted population count data for 2000, 2005, 2010, and 2015 population estimates. Resolution for the GPW version 3 is 2.5 arc-minutes and is available for download from NASA's Socioeconomic Data and Applications Center (SEDAC). Resolution for GPW version 4 is 30 arc-seconds and is currently hosted on Google Earth Engine. Detailed methodology for the development of these datasets is available in Doxsey-Whitfield et al. (2015)

Climate data

We used the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) (Gelaro et al., 2017) as the source for climate data. Both precipitation and temperature datasets (GMAO, 2015) were hourly aggregates with original spatial resolution of 0.5 x 0.625 decimal degrees. From these broader datasets, we extracted the variables PRECTOTCORRLAND (total precipitation land; bias corrected; in kg m-2 s-1, or volumetrically, mm s-1) for precipitation and T2MMEAN (2-meter air temperature in K) for temperature. These subsets were exported from NASA Goddard Earth Sciences Data (GES) and Information Services Center (DISC) in a netCDF format for local analysis.

Data harmonization process

As this project involved harmonizing multiple global datasets at different resolutions, the workflow used required multiple steps, each of which resulted in a cleaned data subset. Here we detail the exact steps taken to integrate the HydroLAKES, HydroBASINS, JRC, climate, and human population datasets described above.

Step #1: Calculate lake surface area

Lake surface area for each lake from 1995 to 2015 was calculated using Google Earth Engine (Gorelick et al., 2017). Lake polygons were uploaded and imported into Earth Engine as shapefiles. These lake polygons, which represent typical shape and area for individual lakes, were buffered by a specified distance to allow water area calculations to account for increases in lake area beyond the HydroLAKES polygon borders as specified in HydroLAKES. Buffered lake polygons were then used as boundaries within which to summarize pixels from annual JRC data for each waterClass category (i.e., no data, not water, seasonal water, or permanent water). We calculated "total water" as the sum of seasonal and permanent water pixels. Resulting area values were exported in .csv format to Google Drive, then downloaded for local analysis using the R statistical environment (R Core Team, 2018). Commented Google Earth Engine code for lake area calculations ("jrc_water_class_sum.txt") and the R script for formatting Google Earth Engine output ("01_import_format_JRC.R") (Dowle and Srinivasan, 2019; Wickham, et al., 2018; Wickham and Henry, 2018) are available in the Environmental Data Initiative (EDI) GLCP repository (Labou and Meyer et al.) within the entity "glcp.tar.gz".

To evaluate how lake waterClass areas fluctuated with various buffer sizes, we calculated lake waterClass areas for 1995, 2000, 2005, 2010, and 2015 with 30 m, 60 m, 90 m, and 120 m buffers. Preliminary tests indicated smaller buffers were insufficient to capture large area increases, while larger buffers increased risk of overlapping neighboring lakes (especially in dense lake areas), smaller ponds, or input/output rivers and erroneously increasing lake area totals. Our analysis of waterClass areas between buffer sizes and years indicated 90 m as the most appropriate distance.

We identified a minority of lakes that were unable to be included in the final data product. One lake in North America was identified as having a broken geometry (Hylak_id = 109424), making it incompatible with Earth Engine-based analyses. Rather than attempt to repair the lake shapefile boundaries and potentially change the size and shape, we chose not to include this lake. Additionally, a small number of lakes were identified to be outside the range of reliable LANDSAT data. The available JRC data has a maximum extent of 80 decimal degrees north and Pekel et al. (2016) note that LANDSAT images above 78 decimal degrees north are sparse, partially due to the short LANDSAT observation season in high northern latitudes. As such, we limited further processing to lakes whose entire extent is below 78 decimal degrees north, which excluded 3,220 lakes (0.23% of original 1.4 million lakes).

Given the potential for lakes in this area to have inaccurate area measurements prior to 1999, we calculated ratios of "no data" pixel areas to "not water", "seasonal water", "permanent water", and "total water" pixel areas. This ratio will enable future users to set desired thresholds of "no data" coverage that are specific to their research questions. Because of the computational demand to produce these ratios for each lake and year, the ratios are provided within a secondary .csv file ("JRC_all_no_data_proportions_yearly_95thru15.csv") and can be merged efficiently with the full GLCP using the provided R script ("combine_data_availability_metrics_with_glcp.R").

Step #2: Match lakes with basins

Because HydroBASINS is derived from river networks, rather than lake pour points, the HydroLAKES and HydroBASINS data do not come with a pre-existing 1:1 matching scheme for lake and basin. To match lakes with their equivalent basins, we performed spatial joins for the lake shapefiles and basin shapefiles to identify the smallest basin that enclosed a lake in its entirety. All lakes that fell within a Pfafstetter Level 12 basin (85% of lakes) were tagged with the Level 12 basin identifier, because no smaller sub-basins were available. The highest level Pfafstetter basin used was Level 2 (Level 1 being near continent-level), which was sufficient to capture the watersheds of very large lakes, such as the Laurentian Great Lakes and the Caspian Sea. This basin matching procedure was performed within Google Earth Engine ("hylak_hybasin_matching.txt") and outputs were formatted locally using R ("07_lake_basin_matching.R") (Dowle and Srinivasan, 2019; Wickham, et al., 2018).

Using this lake/basin matching procedure, 1,949 lakes (0.14% of the original 1.4 million HydroLAKES lakes) were unable to be properly associated with a basin. Manual investigation indicated that these lakes were either located on islands (645 lakes, 0.05% of the original HydroLAKES) or would be associated with only a Level 1 basin (1,304 lakes, 0.09% of the original HydroLAKES). Lakes located on islands are excluded from the GLCP because their natural basins are not included in the continental basin schema that HydroBASINs employs. Similarly, the 1,304 lakes associated with Level 1 basins were consistently located on the boundary between neighboring basins and therefore never completely enclosed in a single basin. This peculiarity is largely because HydroBASINS is constructed for river networks, as opposed to lakes. Because it is unrealistic for these 1,304 lakes (average total area: 2.39 km2) to be influenced by near continental-scale climate and human population forcings, we excluded these lakes from further processing.

Step #3: Calculate basin-level precipitation and temperature estimates

Once basins were associated with lakes, basin-level climate values were calculated. Within the R environment, precipitation values from MERRA-2 were converted to annually accumulated precipitation by aggregating hourly data for each gridcell for each year (Reichle et al., 2017a;b). We also derived the average monthly volume of precipitation for each grid cell for each year (1995-2015) by taking the mean of each year's total monthly precipitation volumes ("summing_hourly_data_precip_mm.R") (Pierce, 2019; Hijmans, 2019; Bivand et al., 2019; Revolution Analytics and Weston, 2019). Temperature values were similarly used to derive an average annual temperature for each year ("summing_hourly_data_temp_K.R") (Pierce, 2019; Hijmans, 2019; Bivand et al., 2019; Revolution Analytics and Weston, 2019; Tierney et al., 2018). The resulting yearly data were saved as rasters. Yearly total precipitation, average monthly total precipitation, and temperature rasters (1995-2015) were then resampled at 1/10th cell size through a bilinear interpolation resampling. The original rectangular grids were converted to squares, with spatial resolution of 0.05 x 0.05 decimal degrees. Because MERRA-2 grid cells were originally sized at 0.5 x 0.625 decimal degree resolution, the initial conversion from a netCDF to a raster format induced extra space (e.g., 90.25 decimal degrees north in a raster). As such, resampled rasters were clipped to 90 decimal degrees north/south and 180 decimal degrees west/east and converted to geotiff format for upload to Google Earth Engine ("manipulate_climate_rasters.R") (Pierce, 2019; Hijmans, 2019; Bivand et al., 2019; Revolution Analytics and Weston, 2019).

For each basin associated with a lake, basin-level average and total precipitation, as well as average temperature, were calculated for each year of interest in Earth Engine. The process was similar to the one described for lake polygons and JRC data, whereby the basins were used as boundaries from which to extract and aggregate pixels. Results were exported as .csv files to Google Drive, then downloaded for local analysis using R. R scripts for data aggregation of climate variables ("04_post_gee_processing_temp.R ", "05_post_gee_processing_precip_sum.R", "06_post_gee_processing_precip_average.R") (Dowle and Srinivasan, 2019; Wickham, et al., 2018; Wickham, 2019) are available on the EDI GLCP repository (Labou and Meyer et al.) within the entity "glcp.tar.gz".

This process resulted in 10 matched basins (associated with 19 lakes) with missing values for climate variables. These basins and lakes were removed from the dataset. Manual assessment showed that these basins were located at higher northern latitudes in the United States and the Russian Federation. We note that other temperature and precipitation datasets are available; subsequent analyses can incorporate alternative climate data sources to match with these basins through the scripts and workflow provided in the EDI entity "glcp.tar.gz".

Step #4: Calculate basin-level population estimates

While other data sources in this project are annual, the global population data we used, which was the current best available at the global scale, was for 5-year increments (1995, 2000, 2005, 2010, 2015). Rather than interpolate the intervening years' values, we chose to leave these blank so that future researchers can personalize statistical methodology to best address these data gaps in context of a specific question.

For each 5-year increment, human population calculations were performed with a technique similar to the climate data aggregations. GPWv3 (1995 data) was converted to a geotiff and imported into Earth Engine. GPWv4 (2000, 2005, 2010, and 2015) rasters were available through the Earth Engine interface. Basin-level population totals were calculated from GPWv3 and GPWv4 data, with basin polygons as spatial boundaries. Results were exported as .csv files to Google Drive, and downloaded for local analysis within the R environment. R scripts for data aggregation of population counts ("02_load_shp_GPWv3.R", "03_load_shp_GPWv4.R") (Dowle and Srinivasan, 2019; Wickham, et al., 2018; Wickham, 2019) are available on the EDI GLCP repository within the entity "glcp.tar.gz" .

Step #5: Merge lake- and basin-level data

Lake- and basin-level output were merged within the R environment. The R script for GLCP production ("08_cleaning_glcp_production.R") (Dowle and Srinivasan, 2019; Wickham, et al., 2018; Wickham and Henry, 2018) is available in the EDI GLCP repository within the entity "glcp.tar.gz".

Synopsis of data harmonization procedures:

Code availability

All Google Earth Engine and R scripts are available from the Environmental Data Initiative GLCP repository within the entity "glcp.tar.gz". The entity "glcp.tar.gz" also includes a standardized file structure that can be downloaded and run locally with little alteration of the original R scripts if users wish to reproduce the GLCP.

References

Bivand, R., T. Keitt, and B. Rowlingson. 2019. rgdal: Bindings for the “Geospatial” Data Abstraction Library

Center for International Earth Science Information Network - CIESIN - Columbia University. Gridded Population of the World, Version 4 (GPWv4): Population Count, Revision 11.

Center for International Earth Science Information Network - CIESIN - Columbia University, United Nations Food and Agriculture Programme - FAO, and Centro Internacional de Agricultura Tropical - CIAT. 2005. Gridded Population of the World, Version 3 (GPWv3): Population Count Grid.

Dowle, M., and A. Srinivasan. 2019. data.table: Extension of `data.frame`,

Doxsey-Whitfield, E., K. MacManus, S. B. Adamo, L. Pistolesi, J. Squires, O. Borkovska, and S. R. Baptista. 2015. Taking advantage of the improved availability of census data: a first look at the gridded population of the world, version 4. Pap. Appl. Geogr. 1: 226–234.

ESRI. 2015. ArcGIS Desktop: Release 10.3.1, Environmental Systems Research Institute.

Gelaro, R., W. McCarty, M. J. Suárez, and others. 2017. The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). J. Clim. 30: 5419–5454. doi:10.1175/JCLI-D-16-0758.1

Global Modeling and Assimilation Office (GMAO). 2015. MERRA-2 tavg1_2d_flx_Nx: 2d,1-Hourly, Time-Averaged,Single-Level,Assimilation,Surface Flux Diagnostics V5.12.4. Goddard Earth Sci. Data Inf. Serv. Cent. GES DISC. doi:10.5067/7MCPBJ41Y0K6

Gorelick, N., M. Hancher, M. Dixon, S. Ilyushchenko, D. Thau, and R. Moore. 2017. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 202: 18–27.

Hijmans, R. J. 2019. raster: Geographic Data Analysis and Modeling.

Labou, S. G., M. F. Meyer, M. R. Brousil, A. N. Cramer, and B. T. Luff. The global lake area, climate, and population (GLCP) dataset.

Lehner, B., and G. Grill. 2013. Global river hydrography and network routing: baseline data and new approaches to study the world’s large river systems: GLOBAL RIVER HYDROGRAPHY AND NETWORK ROUTING. Hydrol. Process. 27: 2171–2186. doi:10.1002/hyp.9740

Lehner, B., K. Verdin, and A. Jarvis. 2008. New Global Hydrography Derived From Spaceborne Elevation Data. Eos Trans. Am. Geophys. Union 89: 93–94. doi:10.1029/2008EO100001

Messager, M. L., B. Lehner, G. Grill, I. Nedeva, and O. Schmitt. 2016. Estimating the volume and age of water stored in global lakes using a geo-statistical approach. Nat. Commun. 7: 13603. doi:10.1038/ncomms13603

Pekel, J.-F., A. Cottam, N. Gorelick, and A. S. Belward. 2016. High-resolution mapping of global surface water and its long-term changes. Nature 540: 418–422. doi:10.1038/nature20584

Pierce, D. 2019. ncdf4: Interface to Unidata netCDF (Version 4 or Earlier) Format Data Files.

R Core Team. 2018. R: A Language and Environment for Statistical Computing.

Reichle, R. H., C. S. Draper, Q. Liu, M. Girotto, S. P. P. Mahanama, R. D. Koster, and G. J. M. De Lannoy. 2017a. Assessment of MERRA-2 Land Surface Hydrology Estimates. J. Clim. 30: 2937–2960. doi:10.1175/JCLI-D-16-0720.1

Reichle, R. H., Q. Liu, R. D. Koster, C. S. Draper, S. P. Mahanama, and G. S. Partyka. 2017b. Land surface precipitation in MERRA-2. J. Clim. 30: 1643–1664.

Revolution Analytics, and S. Weston. 2019. doParallel: Foreach Parallel Adaptor for the “parallel” Package.

Tierney, L., A. J. Rossini, N. Li, and H. Sevcikova. 2018. snow: Simple Network of Workstations.

Wickham, H. 2019. stringr: Simple, Consistent Wrappers for Common String Operations.

Wickham, H., R. Francois, L. Henry, and K. Muller. 2018. dplyr: A Grammar of Data Manipulation.

Wickham, H., and L. Henry. 2018. tidyr: Easily Tidy Data with “spread()” and “gather()” Functions.

Description:

This method step describes provenance-based metadata as specified in the LTER EML Best Practices.

This provenance metadata does not contain entity specific information.

Data Source
HydroLAKES v 1.0
Description:

This method step describes provenance-based metadata as specified in the LTER EML Best Practices.

This provenance metadata does not contain entity specific information.

Data Source
JRC Global Surface Water Dataset
Description:

This method step describes provenance-based metadata as specified in the LTER EML Best Practices.

This provenance metadata does not contain entity specific information.

Data Source
Gridded Population of the World v3
Description:

This method step describes provenance-based metadata as specified in the LTER EML Best Practices.

This provenance metadata does not contain entity specific information.

Data Source
Gridded Population of the World v4
Description:

This method step describes provenance-based metadata as specified in the LTER EML Best Practices.

This provenance metadata does not contain entity specific information.

Data Source
MERRA-2 tavg1_2d_flx_Nx: 2d,1-Hourly,Time-Averaged,Single-Level,Assimilation,Surface Flux Diagnostics V5.12.4
Description:

This method step describes provenance-based metadata as specified in the LTER EML Best Practices.

This provenance metadata does not contain entity specific information.

Data Source
HydroBASINS v 1c

People and Organizations

Creators:
Individual: Stephanie G Labou
Organization:Center for Environmental Research, Education, & Outreach, Washington State University
Id:https://orcid.org/0000-0001-5633-5983
Individual: Michael F Meyer
Organization:School of the Environment, Washington State University
Email Address:
michael.f.meyer@wsu.edu
Id:https://orcid.org/0000-0002-8034-9434
Individual: Matthew R Brousil
Organization:Center for Environmental Research, Education, & Outreach, Washington State University
Email Address:
matthew.brousil@wsu.edu
Id:https://orcid.org/0000-0001-8229-9445
Individual: Alli N Cramer
Organization:School of the Environment, Washington State University
Email Address:
allison.cramer@wsu.edu
Id:https://orcid.org/0000-0002-0356-5782
Individual: Bradley T Luff
Organization:School of the Environment, Washington State University
Email Address:
bradley.luff@wsu.edu
Contacts:
Individual: Stephanie G Labou
Organization:Center for Environmental Research, Education, & Outreach, Washington State University
Id:https://orcid.org/0000-0001-5633-5983
Individual: Michael F Meyer
Organization:School of the Environment, Washington State University
Email Address:
michael.f.meyer@wsu.edu
Id:https://orcid.org/0000-0002-8034-9434
Individual: Matthew R Brousil
Organization:Center for Environmental Research, Education, & Outreach, Washington State University
Email Address:
matthew.brousil@wsu.edu
Id:https://orcid.org/0000-0001-8229-9445
Individual: Alli N Cramer
Organization:School of the Environment, Washington State University
Email Address:
allison.cramer@wsu.edu
Id:https://orcid.org/0000-0002-0356-5782
Individual: Bradley T Luff
Organization:School of the Environment, Washington State University
Email Address:
bradley.luff@wsu.edu

Temporal, Geographic and Taxonomic Coverage

Temporal, Geographic and/or Taxonomic information that applies to all data in this dataset:

Time Period
Begin:
1995-01-01
End:
2015-10-31
Geographic Region:
Description:Global
Bounding Coordinates:
Northern:  78Southern:  -78
Western:  -180Eastern:  180

Project

Parent Project Information:

Title:NSF Graduate Research Fellowship
Personnel:
Individual: Michael F Meyer
Id:https://orcid.org/0000-0002-8034-9434
Role:Principal Investigator
Funding: NSF: DGE-1347973

Maintenance

Maintenance:
Description:completed
Frequency:
Other Metadata

EDI is a collaboration between the University of New Mexico and the University of Wisconsin – Madison, Center for Limnology:

UNM logo UW-M logo