Soil organic carbon (SOC) must be quantified and monitored to assess soil management practices, adapt policies, and evaluate environmental impacts. However, due to SOC spatial variability, soil surveys become a very challenging task because of the high costs of acquiring data, operational complexity, and updating. Digital soil mapping based on machine learning approaches in combination with remote sensing techniques have enabled soil carbon spatial distribution to be significantly improved, even with limited soil samples. A legacy soil database of 8,361 georeferenced profiles and a selection of environmental data-driven covariates intimately related to soil-forming factors (e.g., biota, climate, parent material) were used to generate SOC maps. Modeling of data was based on three supervised learning approaches: quantile regression forest, ensemble machine learning and auto-machine learning. For the final SOC spatial distribution maps, each pixel was assigned the prediction from the most accurate model, i.e., lowest uncertainty.
We applied this modeling technique to generate cost-effective, high-resolution maps (90 m pixel resolution) of SOC distribution, and its associated spatially explicit uncertainty, in peninsular Spain. These maps showed 15.7 g.kg-1 mean SOC concentration at 0-30 cm and 3.6 g.kg-1 at 30-100 cm depth. The total SOC stock at its effective depth was 3.8 Pg C, storing the 74% in the upper 30 cm (2.82 Pg C). The correlation between SOC observed and predictions final values showed R2=0.68 for SOCc and R2=0.54 for SOCs at the upper 30cm.
The methodology proposed in this study aims to improve benchmark SOC estimates in support of the National GHG Emissions Inventory Report