3. Methods for LAGOS-US NETWORKS
This section outlines the methods used for creating the NETWORKS module. We explain how we derived lake connectivity networks and their associated connectivity metrics and how NABD dam data was linked to our networks.
For further technical detail on this process, or to reproduce this effort, users can consult the published scripts (DOI for code).
3.1. Software used for NETWORKS creation
We used a combination of Python 2.7.8 (Van Rossum & Drake 1998), ArcGIS 10.3 Desktop (ESRI 2014), and R version 3.6 to create LAGOS-US NETWORKS.
The majority of the methods are associated with Python scripts which can be found DOI for code. Those scripts can be used to understand, reproduce, or adapt our methods.
ArcGIS was used for some of the dam classifications, mapping, and verification of some metrics.
We used the “nhdR” (Stachelek 2019) package to download NHDplusV2 data and the “hydrolinks” package (Winslow et al. 2018) to verify metrics in R.
3.2. Methods for creating the lake connectivity networks
Lake connectivity networks across the conterminous U.S. were created using the flow table from the NHDPlusV2 database (USGS 2019).
This flow table consists of every flowline (streams and artificial flowlines that go through lakes; Figure 5) either in the FROM column or TO column, denoting a direction of flow from one line to the other, as well as the distance for each connection between two flow lines.
Prior to creating a graph, we removed coastline connections (Fcode 56600; McKay et al. 2012) so that the connectivity networks would not connect through the ocean, estuaries, or the Great Lakes, as well as IDs associated with the Great Lakes water bodies.
Artificial flowlines (Figure 5) were linked to water bodies (nhdplusv2_comid) and these water bodies were linked to lagoslakeids using the lake_link table from the LAGOS_US_LOCUS module (Smith et al. 2020).
Our modified version of the NHDPlusV2 flow table including where artificial flowlines are matched to lakes from the LAGOS-US database can be found as the nets_flow_medres data table.
We applied a graph theory framework to create lake connectivity networks from this flow table. Graphs are mathematical structures used to model pairwise relations between objects, or nodes.
In our case, we are interested in modeling the pairs of lakes that are connected by streams. Two types of graphs can be used to model connections: unidirectional graphs consider either downstream or upstream connections and bidirectional graphs consider both downstream and upstream connections.
We created lake connectivity networks using bidirectional graphs with both lakes and streams as nodes (Figure 6). We used Dijkstra's algorithm (Cormen et al. 2001) to traverse the graph both up and downstream starting at a given lake. During the traversal, if a node was a stream, we continued traversing the graph until the node was a lake.
We saved the distance from the given lake to this lake and stopped traversing. If there were multiple paths to connect the same two lakes, the algorithm chose and saved the path with the shortest length. This process outputs all the connections of the given lake to its neighbor lakes.
This process was repeated for every lake until the connections and stream course distances between all lakes were known.
All lakes that are connected to another lake, up or downstream, are considered part of one network. We assigned each of these networks a unique identification number (net_id).
All of the stream course distances between pairs of lakes can be found in the nets_binetworkdistance_medres. The artificial flowline distances through lakes were not included in these distances.
This table includes upstream, downstream, and total distance between two lakes.
The total distance may be smaller than the sum of the upstream and downstream columns because the graph does not have information on where the stream reaches intersect each other, therefore, an intersecting stream reach is only counted once for the total distance, but may be included in both the downstream and upstream distance columns (Figure 7).
3.3. Methods for linking LAGOS-US NETWORKS with NABD dams
The NABD is a dataset of large, anthropogenic barriers that are spatially linked to the NHDPlusV1 data product to facilitate analyses based on the NHD and National Inventory of Dams (NID) (Ostroff et al. 2013). Cooper et al. (2017) augmented this database with 170 additional dams from the USFWS Fish Passage Decision Support Tool and excluded ~250 dams that were identified as having been removed since the NABD was published (Rivers 2019).
The dams were linked to the NHDPlusV2 flowlines and were incorporated into networks. Dams were assigned to a lagoslakeid if they were less than 50 m from a lake. Dams that fall directly on a lake could not be considered as up- or downstream because they were on the node and therefore, did not have a direction in reference to that node.
Therefore, these dams were assigned as upstream or downstream from a lake using two methods:
1) Using ArcGIS, lake inlets and outlets were identified using the start and end vertices associated with the artificial flowlines and extracted as points representing inlets and outlets. When multiple artificial paths were present, the uppermost artificial flowline was identified for inlet locations and the downstream-most artificial flowline was identified for outlet locations.
For each dam point location, the nearest three inlets or outlets (combined) were identified using euclidean distance in the ArcGIS GenerateNear tool. If the nearest inlet was less than 250m away, and no outlets or other lakes were also nearby, the dam was automatically designated as upstream of the associated lake.
An equivalent, symmetrical rule was applied for nearby outlets. If both inlets and outlets for the same lake were very near each other or an inlet or outlet for another lake was very near, the dam position was assigned for manual review. "Very near" was defined as follows: if the second closest junction is within 50m of the closest junction or if the second closest junction is within 100m of the closest one and the closest junction is within 25m of the dam.
Methods are available as Python code within the LAGOS GIS Toolbox (http://github.com/cont-limno/LAGOS_GIS_Toolbox; national_outlets_inlets.py, dams_link_lake_junctions.py). There were 11,551 dams that were assigned upstream or downstream of a lake using this method.
2) The remaining dams (n=1,079) that could not be identified by the automated process described above in (1) were then manually classified by visual inspection of the dam location in comparison to the NHD polygons and flowlines and manually assigning them as either on the upstream or downstream side of a lake.
Two data flags were created during the process of linking dams to lakes and streams/rivers. These flags are for cases when dams fall onto an artificial flowline contained within a lake or when multiple dams fall on the same lake (Figure 8; Table 5).
3.4. Methods for connectivity metrics
After creating the connectivity networks, several metrics were created at the lake scale using a unidirectional graph. Unidirectional graphs consider either downstream or upstream connections.
For example, in Figure 9 there is a downstream distance between lake A and lake B that is the same distance upstream from lake B to lake A. The connection between lake B and lake C is not included because the unidirectional graph does not traverse both down and upstream.
We used Dijkstra's algorithm (Cormen et al. 2001) to traverse the graph downstream only starting at a given lake. During the traversal, if a node was a stream, we continued traversing the graph until the node was a lake. We saved the distance from the given lake to this lake and stopped traversing.
If there were multiple paths to connect the same two lakes, the algorithm chose and saved the path with the shortest length. This outputs all the connections of the given lake to its neighbor lakes. This process was repeated for every lake until the connections and stream course distances between all lakes were known.
These stream course distances between two lakes using a unidirectional graph can be found in the nets_uninetworkdistance_medres table. Because a unidirectional graph traverses the network only downstream, this table includes either a downstream distance or an upstream distance and there is a mirror image of the distance in the other direction.
The nearest lake distance was determined by comparing the distance between each lake and all of its neighboring lakes and choosing the nearest distance upstream (Figure 10a) and the nearest distance downstream (Figure 10b) from the unidirectional graph.
Note that not all lakes have both an upstream and downstream lake. The number of directly connected lakes upstream was computed as the indegree of a lake, i.e. the number of lakes upstream only connected through streams flowing into the lake.
Similarly, the number of directly connected downstream lakes was calculated using the outdegree of a lake, i.e. lakes directly connected through streams flowing out of a lake. There are instances when a lake does not have any directly connected upstream or directly connected downstream lakes because the lake is only connected through the bidirectional graph to the lake network (e.g. Figure 9, lake C; n=7,617).
Therefore, we also included the nearest bidirectional distance (Figure 10c). This distance is often the same as the nearest downstream or nearest upstream value, however, it can be different if the nearest lake is connected through a bidirectional graph (Figure 10c).
Two metrics that describe the position of a lake within the network and landscape were derived from the unidirectional graph: Lake Network Number (LNN; Figure 10d) and Lake Order (LO; Figure 10e) (Riera et al. 2000; Martin and Soranno 2006).
LNN was computed by starting at the first lake in a network (e.g. no upstream lakes) and assigning that lake a “1”, then moving downstream in the network to another lake and assigning that lake a “2”, and so on.
Therefore, multiple lakes in a network could be assigned a “1” if they did not have any upstream lakes. Lakes with multiple upstream lakes were assigned the larger sequential number (Martin and Soranno 2006).
LO was assigned using the Strahler stream order from the NHDplusV2 attributes. LO follows the Strahler stream order of the outflowing stream, where the higher order stream is chosen if more than one outlet is present (Riera et al. 2000, Martin and Soranno 2006).
There were two exceptions to this: headwater lakes were assigned a “0” and terminal lakes received the order of the inflowing stream (Riera et al. 2000; Martin and Soranno 2006). To differentiate between headwater lakes and lakes that had inflowing streams but not upstream lakes, we considered inflowing streams for LO calculation.
There were instances when a loop between two lakes occurred (0.02% of all connections), for example lake A flows to lake B and lake B flows back to lake A. In these instances, we randomly removed one connection.
Several dam metrics were derived to characterize connectivity barriers. The Depth First Search (DFS; Cormen et al. 2001) algorithm was used to traverse the lake-stream network to find all the upstream dams and downstream dams.
The DFS algorithm is a common computer science technique that is used for traversing graphs by starting at one node and exploring every branch of the graph. Dijkstra’s algorithm was used to compute the distance to the nearest upstream and downstream dams (Cormen 2001).
Because we used a graph to create the network, the algorithm did not have the exact location of the dam on the stream reach, just the flowline it is located on. Therefore, when deriving the metrics for the nearest dam, the entire stream reach that the dam is located on was included in the distance calculation.
Thus, there were instances when two or more dams fell on the same stream flowline (8.7 % occurrence). In these instances, all dams were considered as the nearest up- or downstream dams, they have the same distance from the lake, and all of the dam identifiers were included and separated by a comma.
Similarly, if multiple dams were on a lake, all the dams were considered the nearest dam, all dam identifiers were included, and dams located on a lake were assigned the distance of 0 km.
At the network scale, the completed lake connectivity networks were traversed using the DFS algorithm that counts total lakes in a network, the average distances between lakes in a network, and the total number of dams in each lake network.
The average area of the lakes in a network was calculated using the area from LAGOS-US LOCUS v1 polygons (Smith et al. 2020), grouping lakes by networks, and then using the Calculate Geometry tool in ArcGIS.
Lake networks were created for NETWORKS based on the medium resolution NHDplusV2 flow data; therefore, connectivity may differ from connectivity metrics in LAGOS-US LOCUS that were created based on the NHD high resolution (Smith et al., 2020).
Metrics were only included for lakes connected to other lakes, and therefore do not include isolated lakes or lakes that are only connected to streams.