Raw water temperature data are uploaded to the California Data
Exchange Center (CDEC) by state and federal agencies. Some agencies do
have their own websites for posting and sharing data, which may
include water quality that has undergone quality checks, and
additional water quality stations; however, we used CDEC data to
ensure we could treat all data consistently. Additionally, existing
data packages made it efficient to download all existing CDEC data in
a streamlined way (R code available in this package).
We used CDEC’s station locator map
(https://cdec.water.ca.gov/cdecstations) to download metadata from
stations in our area of interest – the San Francisco Estuary. We
filtered stations to include those that had water temperature data
that was either collected hourly or every 15-minutes (labeled as event
in CDEC). Water temperature sensors are labeled as sensor numbers 25
(temperature in degrees Fahrenheit) and 146 (temperature in degrees
Celsius). Data were downloaded and processed in R version 3.6.2.
Metadata Template
We contacted agency managers and supervisors to locate station manager
names and contact information.
We created a template of relevant metadata regarding water temperature
sensors and available data, and requested station managers fill out
the template.
We edited responses for consistency, and standardized habitat types
(see below).
Habitat Type Definitions
Tidal River Channel (brackish) - a natural brackish river (higher
salinity than freshwater, resulting from a mixture of estuarine river
and seawater) whose flow and level are influenced by tides.
Tidal River Channel (brackish, marsh) - a natural brackish river
within the boundaries of a defined marsh whose flow and level are
influenced by tides.
Tidal River Channel (freshwater) - a natural river, mainly freshwater,
whose flow and level are influenced by tides.
Non-Tidal River Channel (freshwater) - a channel of freshwater that is
not affected by the ebb and flow of tides.
Tidal Canal (freshwater) - an artificial waterway of freshwater
constructed to allow the passage of boats or ships, or to transport
water for agriculture.
Tidal Flooded Island (freshwater) - a man-made flooded island that
contains freshwater.
Tidal Embayment (brackish) - a recess in a waterway or coastline
forming a bay with brackish water.
Tidal Slough - a backwater to a larger body of water that is tidally
influenced, with high water residence time and low water velocity.
Non-Tidal Slough - a backwater to a larger body of water that is
non-tidal, with high water residence time and low water velocity.
Concrete-lined Reservoir - a large natural or artificial lake used as
a source of water supply that is lined with concrete.
Aqueduct - a wide artificial channel or canal used for transporting
water.
Note: Prior to conducting analyses, please read “Comments” column in
the Water Temperature Sensor Metadata Table for additional relevant
information about sensors. We tried to keep columns in consistent
formats, so most of the additional text was added to Comments.
Raw Integrated Dataset
1. Data Download: The CDECRetrieve package
https://github.com/FlowWest/CDECRetrieve was used to download data.
All versions of temperature data for each station were downloaded
(Event-Fahrenheit, Event-Celsius, Hourly-Fahrenheit, Hourly-Celsius),
then combined.
2. Standardization: Fahrenheit temperatures were converted to Celsius.
3. Conversion to hourly temperature points: Data were grouped and
organized by station, date, hour. The first value for each grouping
was kept, while the rest of the data for each station was removed.
This resulted in one temperature value for each station-date-hour.
4. Saved files: The raw hourly temperature dataset was saved as
Temp_all_H.csv and Temp_all_H.rds (this compressed file is smaller and
reads faster into R; this is also the file used for subsequent QC code
and app code). The code is in the file DownloadData_CDEC.Rmd.
Quality Control Filters
Quality control (QC) filters were informed by NOAA’s Manual for
Real-Time Quality Control In-Situ Temperature and Salinity Data, which
can be found at:
https://cdn.ioos.noaa.gov/media/2017/12/qartod_temperature_salinity_manual.pdf.
QC filters were applied to the raw temperature data, and each
temperature value was either flagged (Y) or not flagged (N) for each
filter:
QC1: Temperature Range: Values outside of 1-40 degreesC were flagged/
filtered.
QC2: Missing Values: Days with >4 missing values were flagged/
filtered. Data were grouped by station and date, and summarized for
number of values per grouping. Days with less than 20 temperature
values were flagged. Dates were merged back to full temperature
dataset to flag all values that had a flagged day.
QC3: Repeating Values: Values that repeat 18+ times were flagged/
filtered. Each value was compared with the prior value. A column was
created to indicate whether or not they were the same (same = 1 if
values are not the same, 0 if values are different). The total number
of repeating values was summed. If the number of repeating values was
> 18, the repeating values were flagged/filtered.
QC4: Flag anomalies: Data were separated into seasonal, trend and
remainder components, and remainder component outliers were
flagged/filtered. This analysis was conducted by the anomalize package
in R. Seasonal decomposition method: Loess. Trend: Set at 6 months.
Outlier detection: Used threshold of 3*IQR to flag/filter outliers
QC5: Spike: Values that “spike” compared to the value before and after
were flagged/filtered. Data were grouped by station, and arranged by
Station and Datetime. Each value was compared with the average of the
prior and subsequent values. If the difference was > 5 degreesC,
the value was flagged/ filtered.
QC6: Rate of change: Values with a rate of change greater than 5 *
standard deviation of values from the past 50 hours (~ 2 tidal cycles)
were flagged/filtered. Data were grouped by station, and arranged by
station and datetime. A = The difference between each value and the
prior value. B = The standard deviation of the past 50 temperature
values (hours). If A>B, the value was flagged/ filtered.
Saved Files: Flagged Dataset: Contains all raw data, with a column for
each QC filter above filled in with either a “Y” or “N.” Saved as
Temp_flagged.csv. Filtered Dataset: Removed any data that was flagged
for at least one QC filter. Saved as Temp_filtered.csv. Code:
WaterTemp_QC.Rmd
RShiny: An RShiny app was also created to allow individuals to select
their own filtering criteria and download individual station data
according to their needs. Code: app.R. Shortly after EDI data
publication, this app will be hosted on the Delta Science Program’s
Shiny server at: https://deltascience.shinyapps.io/Home/. There may be
minor changes made to the Rshiny code after publication of this
dataset.
Specific values used for QC filters were initially selected by
real-time water quality experts as values deemed reasonable. Once
values were selected, filters were applied to the raw temperature
data, and were visualized in an R Shiny app, which highlighted data
that were flagged under each filter. Some values were then modified as
deemed appropriate to prevent over- or under-flagging. For one filter,
the QC4 trend (see above), visualization was not deemed enough to make
a selection. Thus, a subset of stations was selected for further
analysis. For the subset of stations, 3-month, 6-month, and 12-month
trends were applied, and downloaded into separate datasets. Each
dataset was analyzed for maximum temperatures, days with maximum
temperatures > 27 degreesC, and whether outliers flagged by QC4
corresponded with heat waves. The 6-month trend was chosen based on
comparisons of the results from the three datasets.
Additional Notes on Data Review and Quality:
USGS water temperature time-series data are reviewed, quality-assured,
and aged within the National Water Information System (NWIS:
https://nwis.waterdata.usgs.gov/nwis) using procedures outlined in
Wagner and others (2006). USGS data released on EDI were imported as
raw data and filtered as described so may not be identical to approved
USGS data published on NWIS.
Our QC methods were meant to efficiently review water temperature data
from a large number of stations by using overarching code for the
entire dataset. However, we did not review every point at every
station in depth, and we recognize there are some points that may have
been flagged incorrectly, or not flagged when they should be flagged.
We leave it to the data user to further review data for their own
analysis needs.