To calculate the spatial prediction of SOCs at 30 cm depth, we generated synthetic
profiles of 0–30 cm depth from the soil local database for SOCc and SOCs data. The
aggregation of the horizons was carried out using the equal-area spline technique
through mass-preserving spline (‘mpspline’) function (Bishop et al., 1999). The
estimation of SOCs in each soil profile were calculated as:
SOCs (Kg∙m^2 )=SOC (g⁄(Kg)∙BD(Kg∙m^3)∙[1-(CRFVOL/100)]∙HSIZE(cm))
were BD is bulk density, CRFVOL is percentage of coarse fragments (above 2 mm in
diameter), and HSIZE is thickness of the horizons. Due to data gaps, BD was estimated by
means of a pedotransfer function adapted from a regional study (Barahona and Santos,
1981). We used the R package GSTAT for the stock estimates and ‘mpspline’ function,
where the propagated error was estimated by the Taylor Series Method (Hengl and Mendes
de Jesus, 2016; Heuvelink, 1998; Malone et al., 2009).
Our DSM approach was based on the SCORPAN conceptual model using the soil forming
environmental factors as soil spatial prediction function (McBratney et al., 2003). We
generated a covariate stack based on 34 environmental factors to predict SOC
Prior to predictive model building, a regression matrix was performed including the
best correlated environmental covariates with SOC local data. To select them, we
considered a balance among higher Pearson coefficient of multiple linear regression,
lower error (RMSE), and lower variance inflation factor (VIF) to identify statistical
redundancy (Heiberger et al., 2005). We used the Akaike information criterion (AIC) to
determine the best compromise between model accuracy and model parsimony (Rossel and
Behrens, 2010).
The quantile regression forest (qrf) model was performed to predict SOCc and SOCs,
since estimates an approximation of the full conditional distribution of the response
variable, the inferred conditional quantiles to build prediction intervals were
estimated as surrogate of the value of uncertainty associated with the response variable
(Meinshausen, 2006). We used the ‘quantregForest’ package ‘implemented in R software for
statistical computing (Meinshausen, 2006).
==================== Data Sources =========================
- Local soil database:
The local database was derived from the LUCDEME Project generated between years
1986-2004 by the “Ministerio de Medio Ambiente de España“ and the support of “Dirección
General de Medio Ambiente de la Región de Murcia” (Alias and Ortiz, 1986). This legacy
database consists of a sampling of 255 soil profiles representative of soil typologies
over a topographic range of 0-1700 m in altitude. For each profile, there are
morphological and analytical data for each horizon (903 horizons).
- Spataial Environmental Covariates:
We used dynamic and static variables as predictors for SOC. The static variables
were 16 topographic parameters derived from a local digital elevation model (DEM) using
the Terrain Analyst functions included in SAGA GIS software (Conrad et al., 2015). The
DEM is available from Geographic Information National Centre (Spain), resulting from
interpolation of LiDAR national images, with a 25 m pixel resolution, re-sampled into
100 m.
The dynamic variables included climatic variables (precipitation and temperature)
(Ninyerola et al., 2005); land cover (IGN, 2012) reclassified into 13 classes; forest
structural variables, aboveground biomass of forest trees cover from LiDAR data (Durante
et al., 2019); and vegetation indexes (VIs). The calculated VIs were The Normalized
Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI), which are
associated to ecosystem functional attributes related to seasonal dynamics of net
primary productivity. These indices were derived from mean annual time series images
(2001-2016) of MODIS-Terra images satellite using Google Earth Engine as describe in
(Arenas-Castro et al., 2019).
===========================================================