To calculate the spatial prediction of SOCs at 30 cm depth, we generated synthetic profiles of 0–30 cm depth from the soil local database for SOCc and SOCs data. The aggregation of the horizons was carried out using the equal-area spline technique through mass-preserving spline (‘mpspline’) function (Bishop et al., 1999). The estimation of SOCs in each soil profile were calculated as:
SOCs (Kg∙m^2 )=SOC (g⁄(Kg)∙BD(Kg∙m^3)∙[1-(CRFVOL/100)]∙HSIZE(cm))
were BD is bulk density, CRFVOL is percentage of coarse fragments (above 2 mm in diameter), and HSIZE is thickness of the horizons. Due to data gaps, BD was estimated by means of a pedotransfer function adapted from a regional study (Barahona and Santos, 1981). We used the R package GSTAT for the stock estimates and ‘mpspline’ function, where the propagated error was estimated by the Taylor Series Method (Hengl and Mendes de Jesus, 2016; Heuvelink, 1998; Malone et al., 2009).
Our DSM approach was based on the SCORPAN conceptual model using the soil forming environmental factors as soil spatial prediction function (McBratney et al., 2003). We generated a covariate stack based on 34 environmental factors to predict SOC
Prior to predictive model building, a regression matrix was performed including the best correlated environmental covariates with SOC local data. To select them, we considered a balance among higher Pearson coefficient of multiple linear regression, lower error (RMSE), and lower variance inflation factor (VIF) to identify statistical redundancy (Heiberger et al., 2005). We used the Akaike information criterion (AIC) to determine the best compromise between model accuracy and model parsimony (Rossel and Behrens, 2010).
The quantile regression forest (qrf) model was performed to predict SOCc and SOCs, since estimates an approximation of the full conditional distribution of the response variable, the inferred conditional quantiles to build prediction intervals were estimated as surrogate of the value of uncertainty associated with the response variable (Meinshausen, 2006). We used the ‘quantregForest’ package ‘implemented in R software for statistical computing (Meinshausen, 2006).
==================== Data Sources =========================
- Local soil database:
The local database was derived from the LUCDEME Project generated between years 1986-2004 by the “Ministerio de Medio Ambiente de España“ and the support of “Dirección General de Medio Ambiente de la Región de Murcia” (Alias and Ortiz, 1986). This legacy database consists of a sampling of 255 soil profiles representative of soil typologies over a topographic range of 0-1700 m in altitude. For each profile, there are morphological and analytical data for each horizon (903 horizons).
- Spataial Environmental Covariates:
We used dynamic and static variables as predictors for SOC. The static variables were 16 topographic parameters derived from a local digital elevation model (DEM) using the Terrain Analyst functions included in SAGA GIS software (Conrad et al., 2015). The DEM is available from Geographic Information National Centre (Spain), resulting from interpolation of LiDAR national images, with a 25 m pixel resolution, re-sampled into 100 m.
The dynamic variables included climatic variables (precipitation and temperature) (Ninyerola et al., 2005); land cover (IGN, 2012) reclassified into 13 classes; forest structural variables, aboveground biomass of forest trees cover from LiDAR data (Durante et al., 2019); and vegetation indexes (VIs). The calculated VIs were The Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI), which are associated to ecosystem functional attributes related to seasonal dynamics of net primary productivity. These indices were derived from mean annual time series images (2001-2016) of MODIS-Terra images satellite using Google Earth Engine as describe in (Arenas-Castro et al., 2019).
===========================================================