Sensors
On top of tower at 110 ft., 20-30 ft above top of canopy:
- Li-7500 open-path gas analyzer (C and H20 flux)
- Gill Windmaster sonic anemometer (3D wind)
- Campbell Scientific CNR4 net radiometer
- Kipp and Zonen albedometer
- Apogee aspirated shield houses a CS 215 temperature sensor
On ground near tower base:
- 4x CS 655 soil moisture reflectometers at 10, 20, 30, and 50 cm
- 2x heat flux stations using 2 Huskeflux heat flux plates and two
thermistors at each station
Using the AMERIFLUX full dataset output AMERIFLUX defined variable
names, we first added more columns to the set by partitioning
TIMESTAMP_START into Year, Month, Day (of the month), and Hour
columns. For data taken at the half-hour, it was reflected in the Hour
column by adding 0.5 to the current hour. We set negative SW_IN and
SW_OUT to NA. AT_DIFF is the difference in air temperature at 3 meters
and 30 meters.
For the gap-filling procedure, each variable (LE, H, and FC) was
assigned two models, a comprehensive model, and a limited model. The
comprehensive model used more variables for prediction than the
limited model. This allows for the use of the comprehensive model for
predictions, but when some of those variables are not available, to
rely on the limited model to make the prediction. For LE, the limited
model used: USTAR, G, AT_3, AT_30, AT_DIFF, MONTH, DAY, and HOUR.
Note, AT_3 and AT_30 are the air temperatures at 3 and 30 meters
respectively. For the LE comprehensive model, the variables used were:
AT_DIFF, SW_IN, SW_OUT, LW_IN, LW_OUT, G, VPD, USTAR, MONTH, DAY,
HOUR, YEAR. For H, the limited model used: USTAR, AT_3, AT_30, MONTH
DAY, HOUR, YEAR, G, and AT_DIFF. The comprehensive model for H used:
USTAR, SW_IN, SW_OUT, LW_IN, LW_OUT, G, AT_DIFF, AT_3, AT_30. The
limited model for FC used: USTAR, VPD, G, AT_DIFF, YEAR, MONTH, HOUR,
and DAY. The comprehensive model for FC used: USTAR, WD, WS, VPD,
SWC_1, SWC_2, NETRAD, SW_IN, SW_OUT, LW_IN, LW_OUT, G, AT_3, AT_30,
AT_DIFF, YEAR, MONTH, DAY, and HOUR. Using the listed variables, we
randomly split the data set into training (70%) and testing (30%) sets
when all the variables' values were available. Then each model was
made using the 'randomForest' package in R with the 'randomForest'
function. Using the models, we compared their predictions to the
testing sets using the Nash-Sutcliffe equilibrium (NSE) value. The
predictions were made with the 'predict' function. Finally, the final
data set was created using all the predictions from each model. When
the comprehensive model had a prediction, it was used over the limited
model because the comprehensive models were better predictors
according to the NSE. The gap-filled values are labeled in the columns
LE_METH, H_METH, and FC_METH indicating which method was used
(comprehensive, limited, or measured).